Predicting Molecular Subtypes of Breast Cancer Using Gene Expression Profiling and Random Forest Classifier

Abdul-Fattah Fararjeh; Enas Al-khlifeh; Abdulaziz A. Aloliqi; Ahmad S. Tarawneh; Ahmad B. Hassanat

doi:10.2174/0115748936314079240827062219

ISSN: 1574-8936
E-ISSN: 2212-392X

Predicting Molecular Subtypes of Breast Cancer Using Gene Expression Profiling and Random Forest Classifier
Authors: Abdul-Fattah Fararjeh¹, Enas Al-khlifeh¹, Abdulaziz A. Aloliqi², Ahmad S. Tarawneh³ and Ahmad B. Hassanat³
View Affiliations Hide Affiliations

¹ Department of Medical Laboratory Sciences, Faculty of Science, Al-Balqa Applied University, Al-salt, Jordan ; ² Department of Basic Health Sciences, College of Applied Medical Sciences, Qassim University, Buraidah, Saudi Arabia ; ³ Faculty of Information Technology, Mutah University, Karak, Jordan
Source: Current Bioinformatics, Volume 20, Issue 10, Dec 2025, p. 890 - 903
DOI: https://doi.org/10.2174/0115748936314079240827062219
- Received: 02 Mar 2024
- Accepted: 10 Jul 2024
- Available online: 10 Oct 2024

Abstract

Background

One of the main causes of cancer-related mortality in women is breast cancer (BC). There were four molecular subtypes of this malignancy, and adjuvant therapy efficacy differed based on these subtypes. Gene expression profiles provide valuable information that is helpful for patients whose prognosis is not clear from clinical markers and immunohistochemistry.

Objective

In this study, we aim to predict molecular types of BC using a gene expression dataset of patients with BC and normal samples using six well-known ensemble machine-learning techniques.

Methods

Two microarray datasets were downloaded; (GSE45827) and (GSE140494) from the Gene Expression Omnibus (GEO) database. These datasets comprise 21 samples of normal tissues that were part of a cohort analysis of primary invasive breast cancer (57 basal, 36 HER2, 56 Luminal A, and 66 Luminal B). Namely, we used AdaBoost, Random Forest (RF), Artificial Neural Network (ANN), Naïve Bayes (NB), Classification and Regression Tree (CART), and Linear Discriminant Analysis (LDA) classifiers.

Results

The results of the data analysis show that the RF and NB classifiers outperform the other models in the prediction of the BC subtype. The RF shows superior performance with an accuracy range between 0.89 and 1.0 in contrast to its competitor NB, which has an average accuracy of 0.91. Our approach perfectly discriminates un-affected cases (normal) from the carcinoma. In this case, the RF provides perfect prediction with zero errors. Additionally, we used PCA, DHWT low-frequency, and DHWT high-frequency to perform a dimensional reduction for the numerous gene expression values. Consequently, the LDA achieves up to 95% improvement in performance through data reduction. Moreover, feature selection allowed for the best performance, which is recorded by the RF with classification accuracy 98%.

Conclusion

Overall, we provide a successful framework that leads to shorter computation times and smaller ML models, especially where memory and time restrictions are crucial.

Article metrics loading...

/content/journals/cbio/10.2174/0115748936314079240827062219

2024-10-10

2026-02-09

From This Site

/content/journals/cbio/10.2174/0115748936314079240827062219

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

HeerE. HarperA. EscandorN. SungH. McCormackV. Fidler-BenaoudiaM.M. Global burden and trends in premenopausal and postmenopausal breast cancer: A population-based study.Lancet Glob. Health202088e1027e103710.1016/S2214‑109X(20)30215‑1 32710860
[Google Scholar]
GuoX.Y. SunG.Y. WangH.M. Effect of postmastectomy radiotherapy on pT1-2N1 breast cancer patients with different molecular subtypes.Breast20226110811710.1016/j.breast.2021.12.013 34942430
[Google Scholar]
XuJ. QinS. YiY. Delving into the heterogeneity of different breast cancer subtypes and the prognostic models utilizing scRNA-Seq and bulk RNA-Seq.Int. J. Mol. Sci.20222317993610.3390/ijms23179936 36077333
[Google Scholar]
SørlieT. TibshiraniR. ParkerJ. Repeated observation of breast tumor subtypes in independent gene expression data sets.Proc. Natl. Acad. Sci. USA2003100148418842310.1073/pnas.0932692100 12829800
[Google Scholar]
JiY. ShengL. DuX. QiuG. ChenB. WangX. Clinicopathological variables predicting HER-2 gene status in immunohistochemistry-equivocal (2+) invasive breast cancer.J. Thorac. Dis.201467896904 25093085
[Google Scholar]
GheybiM.K. BaradaranA. MohajeriM.R. OstovarA. HajalikhaniP. FarrokhiS. Validity of immunohistochemistry method in predicting HER‐2 gene status and association of clinicopathological variables with it in invasive breast cancer patients.Acta Pathol Microbiol Scand Suppl2016124536537110.1111/apm.12518 26859313
[Google Scholar]
AnandU. DeyA. ChandelA.K.S. Cancer chemotherapy and beyond: Current status, drug candidates, associated risks and progress in targeted therapeutics.Genes Dis.20231041367140110.1016/j.gendis.2022.02.007 37397557
[Google Scholar]
AnandU. DeyA. Singh ChandelA.K. Corrigendum to Cancer chemotherapy and beyond: Current status, drug candidates, associated risks and progress in targeted therapeutics. [Genes & Diseases 10 (2023) 1367–1401].Genes Dis.202411410121110.1016/j.gendis.2024.101211 38572324
[Google Scholar]
CaiL. TongY. ZhuX. ShenK. ZhuJ. ChenX. Prolonged time to adjuvant chemotherapy initiation was associated with worse disease outcome in triple negative breast cancer patients.Sci. Rep.2020101702910.1038/s41598‑020‑64005‑4 32341397
[Google Scholar]
ChenM.T. SunH.F. ZhaoY. Comparison of patterns and prognosis among distant metastatic breast cancer patients by age groups: A SEER population-based analysis.Sci. Rep.201771925410.1038/s41598‑017‑10166‑8 28835702
[Google Scholar]
LinS. LinY. WuK. Construction of network biomarkers using inter-feature correlation coefficients (FeCO3) and their application in detecting high-order breast cancer biomarkers.Curr. Bioinform.202217431032610.2174/1574893617666220124123303
[Google Scholar]
ZhaoH. YinX. WangL. Identifying tumour microenvironment-related signature that correlates with prognosis and immunotherapy response in breast cancer.Sci. Data202310111910.1038/s41597‑023‑02032‑2 36869083
[Google Scholar]
CaoC. WangJ. KwokD. webTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study.Nucleic Acids Res.202250D1D1123D113010.1093/nar/gkab957 34669946
[Google Scholar]
ChenX. LinY. QuQ. Analyzing association between expression quantitative trait and CNV for breast cancer based on gene interaction network clustering and group sparse learning.Curr. Bioinform.202217435836810.2174/1574893617666220207095117
[Google Scholar]
QiR. GuoF. ZouQ. String kernels construction and fusion: A survey with bioinformatics application.Front. Comput. Sci.202216616690410.1007/s11704‑021‑1118‑x
[Google Scholar]
GyőrffyB. HatzisC. SanftT. HofstatterE. AktasB. PusztaiL. Multigene prognostic tests in breast cancer: past, present, future.Breast Cancer Res.20151711110.1186/s13058‑015‑0514‑2 25848861
[Google Scholar]
VieiraA.F. SchmittF. An Update on breast cancer multigene prognostic tests-emergent clinical biomarkers.Front. Med. (Lausanne)2018524810.3389/fmed.2018.00248 30234119
[Google Scholar]
CognettiF. BiganzoliL. De PlacidoS. Multigene tests for breast cancer: The physician’s perspective.Oncotarget202112993694710.18632/oncotarget.27948 33953847
[Google Scholar]
MunkácsyG. SantarpiaL. GyőrffyB. Gene expression profiling in early breast cancer-patient stratification based on molecular and tumor microenvironment features.Biomedicines202210224810.3390/biomedicines10020248 35203458
[Google Scholar]
LvQ. LiuY. HuangH. ZhuM. WuJ. MengD. Identification of potential key genes and pathways for inflammatory breast cancer based on GEO and TCGA databases.OncoTargets Ther.2020135541555010.2147/OTT.S255300 32606769
[Google Scholar]
MuL. HuS. LiG. WuP. ZhengK. ZhangS. Comprehensive analysis of DNA methylation gene expression profiles in GEO dataset reveals biomarkers related to malignant transformation of sinonasal inverted papilloma.Discover. Oncology.20241515310.1007/s12672‑024‑00903‑7 38427106
[Google Scholar]
LiaoJ. LiX. GanY. Artificial intelligence assists precision medicine in cancer treatment.Front. Oncol.20231299822210.3389/fonc.2022.998222 36686757
[Google Scholar]
DhasaradhanK. JaichandranR. KiruthikaS.U. RajaprakashS. Performance analysis of machine learning algorithms for breast cancer prediction.WOMEN IN PHYSICS: 7th IUPAP International Conference on Women in Physics.10.1063/5.0181898
[Google Scholar]
AtreyK. SinghB.K. BodheyN.K. Multimodal classification of breast cancer using feature level fusion of mammogram and ultrasound images in machine learning paradigm.Multimedia Tools Appl.2023837213472136810.1007/s11042‑023‑16414‑6
[Google Scholar]
JabeenK. KhanM.A. HameedM.A. AlqahtaniO. AlouaneM.T.H. MasoodA. A novel fusion framework of deep bottleneck residual convolutional neural network for breast cancer classification from mammogram images.Front. Oncol.202414134785610.3389/fonc.2024.1347856 38454931
[Google Scholar]
RoslidarR RahmanA MuhararR A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection.IEEE Access 2020;81161769410.1109/ACCESS.2020.3004056
[Google Scholar]
YariY NguyenTV NguyenHT Deep learning applied for histological diagnosis of breast cancer.IEEE Access202081624324810.1109/ACCESS.2020.3021557
[Google Scholar]
YueW. WangZ. ChenH. PayneA. LiuX. Machine learning with applications in breast cancer diagnosis and prognosis.Designs2018221310.3390/designs2020013
[Google Scholar]
ShenW WangM RiemerN ZhengZ LiuY DongX. Improving BC mixing state and CCN activity representation with machine learning in the community atmosphere model version 6 (CAM6) J Adv Model Earth Syst2024161e2023MS00388910.1029/2023MS003889
[Google Scholar]
AhmadR.M. AliB.R. Al-JasmiF. SinnottR.O. Al DhaheriN. MohamadM.S. A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer.Brief. Bioinform.2023251bbad47910.1093/bib/bbad479 38149678
[Google Scholar]
ZhangB. ShiH. WangH. Machine learning and AI in cancer prognosis, prediction, and treatment selection: A Critical Approach.J. Multidiscip. Healthc.2023161779179110.2147/JMDH.S410301 37398894
[Google Scholar]
KohD.M. PapanikolaouN. BickU. Artificial intelligence and machine learning in cancer imaging.Commun. Med.20222113310.1038/s43856‑022‑00199‑0 36310650
[Google Scholar]
ElementoO. LeslieC. LundinJ. TourassiG. Artificial intelligence in cancer research, diagnosis and therapy.Nat. Rev. Cancer2021211274775210.1038/s41568‑021‑00399‑1 34535775
[Google Scholar]
SufyanM. ShokatZ. AshfaqU.A. Artificial intelligence in cancer diagnosis and therapy: Current status and future perspective.Comput. Biol. Med.202316510735610.1016/j.compbiomed.2023.107356 37688994
[Google Scholar]
AliH HA TarawnehA AlrashidiM AlghamdiM. Magnetic force classifier: A novel method for big data classification.IEEE Access2022PP99110.1109/ACCESS.2022.3142888>
[Google Scholar]
KabirajS. RaihanM. AlviN. AfrinM. AkterL. SohagiS.A. Breast cancer risk prediction using XGBoost and random forest algorithm. 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). Kharagpur, India20201-4.10.1109/ICCCNT49239.2020.9225451
[Google Scholar]
BhardwajA. BhardwajH. SakalleA. UddinZ. SakalleM. IbrahimW. Tree-Based and machine learning algorithm analysis for breast cancer classification.Comput. Intell. Neurosci.202220221610.1155/2022/6715406 35845866
[Google Scholar]
DuanH. ZhangY. QiuH. Machine learning-based prediction model for distant metastasis of breast cancer.Comput. Biol. Med.202416910794310.1016/j.compbiomed.2024.107943 38211382
[Google Scholar]
LiQ. ZhangL. XuL. ZouQ. WuJ. LiQ. Identification and classification of promoters using the attention mechanism based on long short-term memory.Front. Comput. Sci.202216416434810.1007/s11704‑021‑0548‑9
[Google Scholar]
JiangL. LiuC. FanY. Dynamic transcriptome analysis suggests the key genes regulating seed development and filling in Tartary buckwheat (Fagopyrum tataricum Garetn.).Front. Genet.20221399041210.3389/fgene.2022.990412 36072657
[Google Scholar]
LiuZ. LiuL. WengS. Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in colorectal cancer.Nat. Commun.202213181610.1038/s41467‑022‑28421‑6 35145098
[Google Scholar]
BenzekryS. MastriM. NicolòC. EbosJ.M.L. Machine-learning and mechanistic modeling of metastatic breast cancer after neoadjuvant treatment.PLOS Comput. Biol.2024205e101208810.1371/journal.pcbi.1012088 38701089
[Google Scholar]
CiccoliniJ. BarbolosiD. AndréN. BarlesiF. BenzekryS. Mechanistic learning for combinatorial strategies with immuno-oncology drugs: Can model-informed designs help investigators?JCO Precis. Oncol.20204448649110.1200/PO.19.00381 35050741
[Google Scholar]
BenzekryS. Artificial intelligence and mechanistic modeling for clinical decision making in oncology.Clin. Pharmacol. Ther.2020108347148610.1002/cpt.1951 32557598
[Google Scholar]
NiederC. MannsåkerB. YobutaR. Independent validation of a comprehensive machine learning approach predicting survival after radiotherapy for bone metastases.Anticancer Res.20214131471147410.21873/anticanres.14905 33788739
[Google Scholar]
BeverinL. TopalovicM. HalilovicA. DesbordesP. JanssensW. De VosM. Predicting total lung capacity from spirometry: A machine learning approach.Front. Med. (Lausanne)202310117463110.3389/fmed.2023.1174631 37275373
[Google Scholar]
KozubalJ.V. KaniaT. TarawnehA.S. HassanatA. LawalR. Ultrasonic assessment of cement-stabilized soils: Deep learning experimental results.Measurement202322311379310.1016/j.measurement.2023.113793
[Google Scholar]
Al-khlifehE.M. HassanatA.B. Predicting the distribution patterns of antibiotic-resistant microorganisms in the context of Jordanian cases using machine learning techniques.J. Appl. Pharm. Sci.202410.7324/JAPS.2024.177584
[Google Scholar]
Al-KhlifehEM AlkhaziIS AlrowailyMA Extended spectrum beta-Lactamase Bacteria and multidrug resistance in Jordan are predicted using a new machine-learning system.Infect Drug Res202432253240
[Google Scholar]
AbujaberA.A. AlbalkhiI. ImamY. Predicting 90-Day prognosis in ischemic stroke patients post thrombolysis using machine learning.J. Pers. Med.20231311155510.3390/jpm13111555 38003870
[Google Scholar]
AbujaberA.A. AlkhawaldehI.M. ImamY. Predicting 90-day prognosis for patients with stroke: A machine learning approach.Front. Neurol.202314127076710.3389/fneur.2023.1270767 38145122
[Google Scholar]
TarawnehAS AlamriES Al-SaediNN AlauthmanM HassanatAB CTELC: A Constant-time ensemble learning classifier based on KNN for big data.IEEE Access202311897918980210.1109/ACCESS.2023.3307512
[Google Scholar]
AlamriE.S. AltarawnehG.A. BayomyH.M. HassanatA.B. Machine learning classification of roasted arabic coffee: Integrating color, chemical compositions, and antioxidants.Sustainability (Basel)202315151156110.3390/su151511561
[Google Scholar]
Al-MahadeenE. AlghamdiM. TarawnehA.S. Smartphone user identification/authentication using accelerometer and gyroscope data.Sustainability (Basel)202315131045610.3390/su151310456
[Google Scholar]
KozubalJ.V. HassanatA. TarawnehA.S. Automatic strength assessment of the virtually modelled concrete interfaces based on shadow-light images.Constr. Build. Mater.202235912929610.1016/j.conbuildmat.2022.129296
[Google Scholar]
HassanatA. TarawnehA.S. Alkafaween Ea, Elmougy S. Applications review of hassanat distance metric.International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA)202210.1109/ETCEA57049.2022.10009844
[Google Scholar]
HassanatA.B. TarawnehA.S. AbedS.S. AltarawnehG.A. AlrashidiM. AlghamdiM. Rdpvr: Random data partitioning with voting rule for machine learning from class-imbalanced datasets.Electronics (Basel)202211222810.3390/electronics11020228
[Google Scholar]
MbaidinA. CernadasE. Al-TarawnehZ.A. Mscf: Multi-scale canny filter to recognize cells in microscopic images.Sustainability (Basel)202315181369310.3390/su151813693
[Google Scholar]
HassanatA.B.A. AlbustanjiA.A. TarawnehA.S. DeepVeil: deep learning for identification of face, gender, expression recognition under veiled conditions.Int. J. Biom.2022143/445348010.1504/IJBM.2022.124683
[Google Scholar]
TarawnehA.S. HassanatA.B. AlkafaweenE. Deepknuckle: Deep learning for finger knuckle print recognition.Electronics (Basel)202211451310.3390/electronics11040513
[Google Scholar]
WeissL.M. ChuP. SchroederB.E. Blinded comparator study of immunohistochemical analysis versus a 92-gene cancer classifier in the diagnosis of the primary site in metastatic tumors.J. Mol. Diagn.201315226326910.1016/j.jmoldx.2012.10.001 23287002
[Google Scholar]
TothillR.W. ShiF. PaimanL. Development and validation of a gene expression tumour classifier for cancer of unknown primary.Pathology201547171210.1097/PAT.0000000000000194 25485653
[Google Scholar]
Cancer Genome AtlasN. Comprehensive molecular portraits of human breast tumours.Nature20124907418617010.1038/nature11412 23000897
[Google Scholar]
BertucciF. FinettiP. OstrowskiJ. Genomic grade index predicts postoperative clinical outcome of GIST.Br. J. Cancer201210781433144110.1038/bjc.2012.390 22929880
[Google Scholar]
TurkkiR. ByckhovD. LundinM. Breast cancer outcome prediction with tumour tissue images and machine learning.Breast Cancer Res. Treat.20191771415210.1007/s10549‑019‑05281‑1 31119567
[Google Scholar]
MontazeriM. MontazeriM. MontazeriM. BeigzadehA. Machine learning models in breast cancer survival prediction.Technol. Health Care2016241314210.3233/THC‑151071 26409558
[Google Scholar]
WuJ. HicksC. Breast cancer type classification using machine learning.J. Pers. Med.20211126110.3390/jpm11020061 33498339
[Google Scholar]
YuZ. WangZ. YuX. ZhangZ. RNA-Seq-based breast cancer subtypes classification using machine learning approaches.Comput. Intell. Neurosci.2020202011310.1155/2020/4737969 33178256
[Google Scholar]
ZhaoY. PanZ. NamburiS. CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.EBioMedicine20206110303010.1016/j.ebiom.2020.103030 33039710
[Google Scholar]
VibertJ. PierronG. BenoistC. Identification of tissue of origin and guided therapeutic applications in cancers of unknown primary using deep learning and RNA sequencing (TransCUPtomics).J. Mol. Diagn.202123101380139210.1016/j.jmoldx.2021.07.009 34325056
[Google Scholar]

/content/journals/cbio/10.2174/0115748936314079240827062219

Predicting Molecular Subtypes of Breast Cancer Using Gene Expression Profiling and Random Forest Classifier

Curr Bioinform 20, 890 (2025); https://doi.org/10.2174/0115748936314079240827062219

/content/journals/cbio/10.2174/0115748936314079240827062219

Data & Media loading...

Article Type: Research Article

Keyword(s): Breast cancer; gene expression signature; immunohistochemistry; machine learning; predictive analysis; random forest

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Predicting Molecular Subtypes of Breast Cancer Using Gene Expression Profiling and Random Forest Classifier

Abstract

Most Read This Month

Most Cited Most Cited RSS feed