Skip to content
2000
Volume 15, Issue 3
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Introduction: Stemming is an important preprocessing step in text classification, and could contribute to increasing text classification accuracy. Although many works have proposed stemmers for the English language, few stemmers have been proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need to further improve Arabic text classification. Methods: This work combined the use of the recently proposed P-stemmer with various classifiers to find the optimal classifier for the P-stemmer in terms of Arabic text classification. As part of this work, a synthesized dataset was collected. Results: The previous experiments show that the use of P-stemmer has a positive effect on classification. The degree of improvement is classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the experiments show that the best classifier with the P-Stemmer is NB. This is an interesting result as this classifier is well-known for its fast learning and classification time. Discussion: First, the continuous improvement of the P-stemmer by more optimization steps is necessary to further improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient Arabic datasets, especially large ones, is still an issue. Conclusion: In this work, an improved P-stemmer was proposed by combining its use with various classifiers. In order to evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various online news pages. Next, the P-stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and K-Star.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/2666255813999200904114023
2022-03-01
2025-07-14
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/2666255813999200904114023
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test