- Home
- A-Z Publications
- Current Bioinformatics
- Fast Track Listing
Current Bioinformatics - Online First
Description text for Online First listing goes here...
-
-
PredPVP: A Stacking Model for Predicting Phage Virion Proteins Based on Feature Selection Methods
Authors: Qian Cao, Xufeng Xiao, Yannan Bin, Jianping Zhao and Chunhou ZhengAvailable online: 28 October 2024More LessBackgroundPhage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information.
MethodsIn this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods wereutilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model.
ResultsThe results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%.
Conclusion:We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.
-
-
-
Hybrid Feature Extraction for Breast Cancer Classification Using the Ensemble Residual VGG16 Deep Learning Model
Available online: 28 October 2024More LessIntroductionBreast Cancer (BC) is a significant cause of high mortality amongst women globally and probably will remain a disease posing challenges about its detectability. Advancements in medical imaging technology have improved the accuracy and efficiency of breast cancer classification. However, tumor features' complexity and imaging data variability still pose challenges.
MethodsThis study proposes the Ensemble Residual-VGG-16 model as a novel combination of the Deep Residual Network (DRN) and VGG-16 architecture. This model is purposely engineered with maximal precision for the task of breast cancer diagnosis based on mammography images. We assessed its performance by accuracy, recall, precision, and the F1-Score. All these metrics indicated the high performance of this Residual-VGG-16 model. The diagnostic residual-VGG16 performed exceptionally well with an accuracy of 99.6%, precision of 99.4%, recall of 99.7%, F1 score of 98.6%, and Mean Intersection over Union (MIoU) of 99.8% with MIAS datasets.
ResultsSimilarly, the INBreast dataset achieved an accuracy of 93.8%, a precision of 94.2%, a recall of 94.5%, and an F1-score of 93.4%.
ConclusionThe proposed model is a significant advancement in breast cancer diagnosis, with high accuracy and potential as an automated grading.
-
-
-
A Low Transformed Tubal Rank Tensor Model Using a Spatial-Tubal Constraint for Sample Clustering with Cancer Multi-omics Data
Authors: Sheng-Nan Zhang, Ying-Lian Gao, Yu-Lin Zhang, Junliang Shang, Chun-Hou Zheng and Jin-Xing LiuAvailable online: 21 October 2024More LessBackgroundSince each dimension of a tensor can store different types of genomics data, compared to matrix methods, utilizing tensor structure can provide a deeper understanding of multi-dimensional data while also facilitating the discovery of more useful information related to cancer. However, in reality, there are issues such as insufficient utilization of prior knowledge in multi-omics data and limitations in the recovery of low-tubal-rank tensors. Therefore, the method proposed in this article was developed.
Objective: In this paper, we proposed a low transformed tubal rank tensor model (LTTRT) using a spatial-tubal constraint to accurately partition different types of cancer samples and provide reliable theoretical support for the identification, diagnosis, and treatment of cancer.
MethodIn the LTTRT method, the transformed tensor nuclear norm based on the transformed tensor singular value decomposition is characterized by the low-rank tensor, which can explore the global low-rank property of the tensor, resolving the challenge of the tensor nuclear norm-based method not achieving the lowest tubal rank. Additionally, the introduction of weighted total variation regularization is conducive to extracting more information from sequencing data in both spatial and tubal dimensions, exploring cross-correlation features of multiple genomic data, and addressing the problem of overlooking prior knowledge from various perspectives. In addition, the L1-norm is used to improve sparsity. A symmetric Gauss‒Seidel-based alternating direction method of multipliers (sGS-ADMM) is used to update the LTTRT model iteratively.
ResultsThe experiments of sample clustering on multiple integrated cancer multi-omics datasets show that the proposed LTTRT method is better than existing methods. Experimental results validate the effectiveness of LTTRT in accurately partitioning different types of cancer samples.
ConclusionThe LTTRT method achieves precise segmentation of different types of cancer samples.
-
-
-
NEXT-GEN Medicine: Designing Drugs to Fit Patient Profiles
Authors: Raj Kamal, Diksha, Priyanka Paul, Ankit Awasthi and Amandeep SinghAvailable online: 17 October 2024More LessBackground : Personalized medicine, with its focus on tailoring drug formulations to individual patient profiles, has made significant strides in healthcare. The integration of genomics, biomarkers, nanotechnology, 3D printing, and real-time monitoring provides a comprehensive approach to optimizing drug therapies on an individual basis. This review aims to highlight the recent advancements in personalized medicine and its applications in various diseases, such as cancer, cardiovascular diseases, diabetes mellitus, and neurodegenerative diseases. The review explores the integration of multiple technologies in the field of personalized medicine, including genomics, biomarkers, nanotechnology, 3D printing, and real-time monitoring. As these technologies continue to evolve, we are entering an era of truly personalized medicine that promises improved treatment outcomes, reduced adverse effects, and a more patient-centric approach to healthcare. The advancements in personalized medicine hold great promise for improving patient outcomes and reducing adverse effects, heralding a new era in patient-centric healthcare.
-
-
-
Artificial Intelligence in Diabetes Mellitus Prediction: Advancements and Challenges - A Review
Authors: Rohit Awasthi, Anjali Mahavar, Shraddha Shah, Darshana Patel, Mukti Patel, Drashti Shah and Ashish PatelAvailable online: 16 October 2024More LessPoor dietary habits and a lack of understanding are contributing to the rapid global increase in the number of diabetic people. Therefore, a framework that can accurately forecast a large number of patients based on clinical details is needed. Artificial intelligence (AI) is a rapidly evolving field, and its implementations to diabetes, a worldwide pandemic, have the potential to revolutionize the strategy of diagnosing and forecasting this chronic condition. Algorithms based on artificial intelligence fundamentals have been developed to support predictive models for the risk of developing diabetes or its complications. In this review, we will discuss AI-based diabetes prediction. Thus, AI-based new-onset diabetes prediction has not beaten the statistically based risk stratification models, in traditional risk stratification models. Despite this, it is anticipated that in the near future, a vast quantity of well-organized data and an abundance of processing power will optimize AI's predictive capabilities, greatly enhancing the accuracy of diabetic illness prediction models.
-
-
-
Intersecting Peptidomics and Bioactive Peptides in Drug Therapeutics
Available online: 15 October 2024More LessPeptidomics is the study of total peptides that describe the functions, structures, and interactions of peptides within living organisms. It comprises bioactive peptides derived naturally or synthetically designed that exhibit various therapeutic properties against microbial infections, cancer progression, inflammation, etc. With the current state of the art, Bioinformatics tools and techniques help analyse large peptidomics data and predict peptide structure and functions. It also aids in designing peptides with enhanced stability and efficacy. Peptidomics studies are gaining importance in therapeutics as they offer increased target specificity with the least side effects. The molecular size and flexibility of peptides make them a potential drug candidate for designing protein-protein interaction inhibitors. These features increased their drug potency with the considerable increase in the number of peptide drugs available in the market for various health commodities. The present review extensively analyses the peptidomics field, focusing on different bioactive peptides and therapeutics, such as anticancer peptide drugs. Further, the review provides comprehensive information on in silico tools available for peptide research. The importance of personalised peptide medicines in disease therapy is discussed along with the case study. Further, the major limitations of peptide drugs and the different strategies to overcome those limitations are reviewed.
-
-
-
The Use of Gene Expression Profiling to Predict Molecular Subtypes of Breast Cancer by a New Machine Learning Algorithm: Random Forest
Available online: 14 October 2024More LessBackgroundOne of the main causes of cancer-related mortality in women is breast cancer [BC]. There were four molecular subtypes of this malignancy, and adjuvant therapy efficacy differed based on these subtypes. Gene expression profiles provide valuable information that is helpful for patients whose prognosis is not clear from clinical markers and immunohistochemistry.
ObjectiveIn this study, we aim to predict molecular types of BC using a gene expression dataset of patients with BC and normal samples using six well-known ensemble machine-learning techniques.
MethodsTwo microarray datasets were downloaded; [GSE45827] and [GSE140494] from the Gene Expression Omnibus [GEO] database. These datasets comprise 21 samples of normal tissues that were part of a cohort analysis of primary invasive breast cancer [57 basal, 36 HER2, 56 Luminal A, and 66 Luminal B]. Namely, we used AdaBoost, Random Forest [RF], Artificial Neural Network [ANN], Naïve Bayes [NB], Classification and Regression Tree [CART], and Linear Discriminant Analysis [LDA] classifiers.
ResultThe results of the data analysis show that the RF and NB classifiers outperform the other models in the prediction of the BC subtype. The RF shows superior performance with an accuracy range between 0.89 and 1.0 in contrast to its competitor NB, which has an average accuracy of 0.91. Our approach perfectly discriminates un-affected cases [normal] from the carcinoma. In this case, the RF provides perfect prediction with zero errors. Additionally, we used PCA, DHWT low-frequency, and DHWT high-frequency to perform a dimensional reduction for the numerous gene expression values. Consequently, the LDA achieves up to 95% improvement in performance through data reduction. Moreover, feature selection allowed for the best performance, which is recorded by the RF with classification accuracy 98%.
ConclusionOverall, we provide a successful framework that leads to shorter computation times and smaller ML models, especially where memory and time restrictions are crucial.
-
-
-
scADCA: An Anomaly Detection-Based scRNA-seq Dataset Cell Type Annotation Method for Identifying Novel Cells
Authors: Yongle Shi, Yibing Ma, Xiang Chen and Jie GaoAvailable online: 14 October 2024More LessBackgroundWith the rapid evolution of single-cell RNA sequencing technology, the study of cellular heterogeneity in complex tissues has reached an unprecedented resolution. One critical task of the technology is cell-type annotation. However, challenges persist, particularly in annotating novel cell types.
ObjectiveCurrent methods rely heavily on well-annotated reference data, using correlation comparisons to determine cell types. However, identifying novel cells remains unstable due to the inherent complexity and heterogeneity of scRNA-seq data and cell types. To address this problem, we propose scADCA, a method based on anomaly detection, for identifying novel cell types and annotating the entire dataset.
MethodsThe convolutional modules and fully connected networks are integrated into an autoencoder, and the reference dataset is trained to obtain the reconstruction errors. The threshold based on these errors can distinguish between novel and known cells in the query dataset. After novel cells are identified, a multinomial logistic regression model fully annotates the dataset.
ResultsUsing a simulation dataset, three real scRNA-seq pancreatic datasets, and a real scRNA-seq lung cancer cell line dataset, we compare scADCA with six other cell-type annotation methods, demonstrating competitive performance in terms of distinguished accuracy, full accuracy, -score, and confusion matrix.
ConclusionIn conclusion, the scADCA method can be further improved and expanded to achieve better performance and application effects in cell type annotation, which is helpful to improve the accuracy and reliability of cytology research and promote the development of single-cell omics.
-
-
-
CLPr_in_ML: Cleft Lip and Palate Reconstructed Features with Machine Learning
Authors: Baitong Chen, Ning Li and Wenzheng BaoAvailable online: 09 October 2024More LessBackgroundCleft lip and palate are two of the most common craniofacial congenital malformations in humans. It influences tens of millions of patients worldwide. The hazards of this disease are multifaceted, extending beyond the obvious facial malformation to encompass physiological functions, oral health, psychological well-being, and social aspects.
ObjectiveThe primary objective of our study is to demonstrate the importance of imaging in detecting cleft lip and palate. By observing the morphological and structural abnormalities involving the lip and palate through imaging methods, this study aims to establish imaging as the primary diagnostic approach for this disease.
MethodsIn this work, we proposed a novel model to analyze unilateral complete cleft lip and palate after velopharyngeal closure and non-left lip and palate patients from the Department of Stomatology of Xuzhou First People's Hospital, Conical Beam CT (CBCT) images in silicon. In order to demonstrate the generalization, the simulated dataset was constructed using the random disturbance factor, which is from the actual dataset. We extracted several raw features from CBCT images in detail. Then, we proposed a novel feature reconstruction method, including six types of reconstructed factors, to reconstruct the existing features. Then, the reconstructed features weretrained with machine learning algorithms. Finally, the testing and independent data model was utilized to analyze the performance of this work.
ResultsBy comparing different operator features, the min operator, max operator, average operator, and all operators can achieve good performances in both the testing set and the independent set.
ConclusionWith the different operator features, the majority of classification models, including Gradient Boosting, Hist Gradient Boosting, Multilayer Perceptron, lightGBM, and broadened learning, classification algorithms can get the well-performances in the selected reconstructed feature operators.
-