Skip to content
2000
image of The Use of Gene Expression Profiling to Predict Molecular Subtypes of Breast Cancer by a New Machine Learning Algorithm: Random Forest

Abstract

Background

One of the main causes of cancer-related mortality in women is breast cancer [BC]. There were four molecular subtypes of this malignancy, and adjuvant therapy efficacy differed based on these subtypes. Gene expression profiles provide valuable information that is helpful for patients whose prognosis is not clear from clinical markers and immunohistochemistry.

Objective

In this study, we aim to predict molecular types of BC using a gene expression dataset of patients with BC and normal samples using six well-known ensemble machine-learning techniques.

Methods

Two microarray datasets were downloaded; [GSE45827] and [GSE140494] from the Gene Expression Omnibus [GEO] database. These datasets comprise 21 samples of normal tissues that were part of a cohort analysis of primary invasive breast cancer [57 basal, 36 HER2, 56 Luminal A, and 66 Luminal B]. Namely, we used AdaBoost, Random Forest [RF], Artificial Neural Network [ANN], Naïve Bayes [NB], Classification and Regression Tree [CART], and Linear Discriminant Analysis [LDA] classifiers.

Result

The results of the data analysis show that the RF and NB classifiers outperform the other models in the prediction of the BC subtype. The RF shows superior performance with an accuracy range between 0.89 and 1.0 in contrast to its competitor NB, which has an average accuracy of 0.91. Our approach perfectly discriminates un-affected cases [normal] from the carcinoma. In this case, the RF provides perfect prediction with zero errors. Additionally, we used PCA, DHWT low-frequency, and DHWT high-frequency to perform a dimensional reduction for the numerous gene expression values. Consequently, the LDA achieves up to 95% improvement in performance through data reduction. Moreover, feature selection allowed for the best performance, which is recorded by the RF with classification accuracy 98%.

Conclusion

Overall, we provide a successful framework that leads to shorter computation times and smaller ML models, especially where memory and time restrictions are crucial.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936314079240827062219
2024-10-14
2024-11-02
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/0115748936314079240827062219
Loading
/content/journals/cbio/10.2174/0115748936314079240827062219
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test