POS Tagging for Amharic: A Machine Learning Approach
Main Article Content
Abstract
In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.