Skip to content
2000
Volume 15, Issue 5
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Background: Part of Speech (POS) Tagging is a process of defining a suitable part of speech for each word in the given context such as defining if a word is a verb, a noun or a particle. POS tagging is an important preprocessing step in many Natural Language Processing (NLP) applications such as question answering, text summarization, and information retrieval. Objectives: The performance of NLP applications depends on the accuracy of POS taggers since assigning the right tags for the words in a sentence enables the application to work properly after tagging. Many approaches have been proposed for Arabic language, but more investigations are needed to improve the efficiency of Arabic POS taggers. Methods: In this study, we propose a supervised POS tagging system for the Arabic language using Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) as well as Hidden Markov Model (HMM). The tagging process is considered as an optimization problem and illustrated as a swarm, which consists of a group of particles. Each particle represents a sequence of tags. The PSO algorithm is applied to find the best sequence of tags, which represent the correct tags of the sentence. The genetic operators: crossover and mutation are used to find personal best, global best, and velocity of the PSO algorithm. HMM is used to find fitness of the particles in the swarm. Results: The performance of the proposed approach is evaluated on the KALIMAT dataset, which consists of 18 million words and a tag set consists of 45 tags, which cover all Arabic POS tags. The proposed tagger achieved an accuracy of 90.5%. Conclusion: Experimental results revealed that the proposed tagger achieved promising results compared to four existing approaches. Other approaches can identify only three tags: noun, verb and particle. Also, the accuracy for some tags outperformed those achieved by other approaches.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/2666255814666210114120558
2022-06-01
2024-10-20
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/2666255814666210114120558
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test