Full text loading...
-
Comprehensive Analysis of Oversampling Techniques for Addressing Class Imbalance Employing Machine Learning Models
-
-
- 16 Aug 2024
- 01 Nov 2024
- 10 Dec 2024
Abstract
Unbalanced datasets present a significant challenge in machine learning, often leading to biased models that favor the majority class. Recent oversampling techniques like SMOTE, Borderline SMOTE, and ADASYN attempt to mitigate these issues. This study investigates these techniques in conjunction with machine learning models like SVM, Decision Tree, and Logistic Regression. The results reveal critical challenges such as noise amplification and overfitting, which we address by refining the oversampling approaches to improve model performance and generalization.
In order to address this challenge of unbalanced datasets, the minority class is oversampled to accommodate the majority class. Oversampling techniques such SMOTE (Synthetic Minority Oversampling Technique), Borderline SMOTE and ADASYN (Adaptive Synthetic Sampling) are used in this work.
To perform the comprehensive analysis of various oversampling methods for taking acre of class imbalance issue using ML methods.
The proposed methodology uses BERT technique which removes the pre-processing step. Various proposed oversampling techniques in the literature are used for balancing the data, followed by feature extraction followed by text classification using ML algorithms. Experiments are performed using ML classification algorithms like Decision tree (DT), Logistic regression (LR), Support vector machine (SVM) and Random forest (RF) for categorizing the data.
The results show improvement corresponding SVM using Borderline SMOTE, resulting in an accuracy of 71.9% and MCC value of 0.53.
The suggested method assists in the evolution of fairer and more effective ML models by addressing this basic issue of class imbalance.