- Home
- A-Z Publications
- Current Bioinformatics
- Fast Track Listing
Current Bioinformatics - Online First
Description text for Online First listing goes here...
-
-
The Use of Gene Expression Profiling to Predict Molecular Subtypes of Breast Cancer by a New Machine Learning Algorithm: Random Forest
Available online: 14 October 2024More LessBackgroundOne of the main causes of cancer-related mortality in women is breast cancer [BC]. There were four molecular subtypes of this malignancy, and adjuvant therapy efficacy differed based on these subtypes. Gene expression profiles provide valuable information that is helpful for patients whose prognosis is not clear from clinical markers and immunohistochemistry.
ObjectiveIn this study, we aim to predict molecular types of BC using a gene expression dataset of patients with BC and normal samples using six well-known ensemble machine-learning techniques.
MethodsTwo microarray datasets were downloaded; [GSE45827] and [GSE140494] from the Gene Expression Omnibus [GEO] database. These datasets comprise 21 samples of normal tissues that were part of a cohort analysis of primary invasive breast cancer [57 basal, 36 HER2, 56 Luminal A, and 66 Luminal B]. Namely, we used AdaBoost, Random Forest [RF], Artificial Neural Network [ANN], Naïve Bayes [NB], Classification and Regression Tree [CART], and Linear Discriminant Analysis [LDA] classifiers.
ResultThe results of the data analysis show that the RF and NB classifiers outperform the other models in the prediction of the BC subtype. The RF shows superior performance with an accuracy range between 0.89 and 1.0 in contrast to its competitor NB, which has an average accuracy of 0.91. Our approach perfectly discriminates un-affected cases [normal] from the carcinoma. In this case, the RF provides perfect prediction with zero errors. Additionally, we used PCA, DHWT low-frequency, and DHWT high-frequency to perform a dimensional reduction for the numerous gene expression values. Consequently, the LDA achieves up to 95% improvement in performance through data reduction. Moreover, feature selection allowed for the best performance, which is recorded by the RF with classification accuracy 98%.
ConclusionOverall, we provide a successful framework that leads to shorter computation times and smaller ML models, especially where memory and time restrictions are crucial.
-
-
-
scADCA: An Anomaly Detection-Based scRNA-seq Dataset Cell Type Annotation Method for Identifying Novel Cells
Authors: Yongle Shi, Yibing Ma, Xiang Chen and Jie GaoAvailable online: 14 October 2024More LessBackgroundWith the rapid evolution of single-cell RNA sequencing technology, the study of cellular heterogeneity in complex tissues has reached an unprecedented resolution. One critical task of the technology is cell-type annotation. However, challenges persist, particularly in annotating novel cell types.
ObjectiveCurrent methods rely heavily on well-annotated reference data, using correlation comparisons to determine cell types. However, identifying novel cells remains unstable due to the inherent complexity and heterogeneity of scRNA-seq data and cell types. To address this problem, we propose scADCA, a method based on anomaly detection, for identifying novel cell types and annotating the entire dataset.
MethodsThe convolutional modules and fully connected networks are integrated into an autoencoder, and the reference dataset is trained to obtain the reconstruction errors. The threshold based on these errors can distinguish between novel and known cells in the query dataset. After novel cells are identified, a multinomial logistic regression model fully annotates the dataset.
ResultsUsing a simulation dataset, three real scRNA-seq pancreatic datasets, and a real scRNA-seq lung cancer cell line dataset, we compare scADCA with six other cell-type annotation methods, demonstrating competitive performance in terms of distinguished accuracy, full accuracy, -score, and confusion matrix.
ConclusionIn conclusion, the scADCA method can be further improved and expanded to achieve better performance and application effects in cell type annotation, which is helpful to improve the accuracy and reliability of cytology research and promote the development of single-cell omics.
-