Skip to content
2000
Volume 14, Issue 1
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Background: POS tagging is the process of identifying the correct grammatical category of words based on its meaning and context in a text document. It is one of the preliminary steps in the processing of natural language text. If any error happens in POS tagging the same will be propagated to whole NLP applications. Hence it must be handled in a genuine and precise way. Aim: The purpose of this study is to develop a deep level tagger for Malayalam which indicates the semantics of nouns and verbs in a text document. Methods: The proposed model is a two-tier architecture consisting of deep learning as well as rulebased approaches. The first tier consists of a tagging model, which is trained by a tagged corpus of 287,000 words. To improve the depth of tagging a suffix stripper is also used which can provide morhological features to the shallow machine learning model. Results: The system is trained on 2,30,000 words and tested on 57,000 words. The accuracy of tagging for the phase-1 architecture is 92.03%. Similarly the accuracy of phase-2 architecture is 98.11%. The overall accuracy of tagging is 91.82%. Conclusion: The exclusive feature of the proposed tagger is its depth in tagging the noun words. This deep level information can be used in various semantic processing applications of the natural language text like anaphora resolution, text summarization, machine translation, etc.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/2213275912666190204133657
2021-01-01
2024-11-08
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/2213275912666190204133657
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test