Enhancing Multi-Label News Text Classification for an Understudied Language: A Comprehensive Study on CNN Performance and Pre-Trained Word Embeddings
Abstract
Today's news texts are classified using a multi-label system, which allows for the assignment of a potentially large number of labels to specific instances. The majority of earlier scholars have only looked into mutual exclusion at a single level. Nonetheless, the primary goal of this study was to categorise the news material using multiple labels. Many text documents are created these days from a variety of offline and internet sources. This generated news text is disordered state. As a result, timely access to the needed content from the sources is challenging. Compared with traditional text classification, multi-label classification is difficult and challenging because of its multi-dimensional labels. Convolutional neural networks are used in this study's tests on the problem domain for Afaan Oromo multi-label news text classification due to their ease of assimilation of pre-trained word embeddings. According to pre-trained word embedding with a train-test ratio of 10/90, the new proposed model has shown improved performance. The suggested CNN models might be helpful for labelling news articles in Afaan Oromo news text. The goal of many researchers working on Afaan Oromo classifier development is to use various learning algorithms to boost classification accuracy as the number of categories or labels increases. Using various approaches, they attempted to use basic machine learning methods to address the calculation time issue. Unfortunately, all low-resource language researchers focus on flat, hierarchical, and multi-class classification types, but we created a model for multi-label text classification and attempted to apply it using a deep learning algorithm. Over 5640 Afaan Oromo news dataset items are analysed experimentally over eight main news categories. Python served as our experimental platform for both text classification and word embedding. After the model is fully implemented, the best result of the precision, recall, F1 score and accuracy rate train test ratio of 10/90 for pertained word_ embedding is 89.7%, 88.6%, 93.3% and 96.5, respectively.
Keywords
References
C. C. Aggarwal and C. X. Zhai , “Mining text data springer Science & Business Media”, Journal of Springer Science DOI 10.1007/978-1-4614 3223-4, 2012
Z. Younes, F. Abdallah, T. Denoeux and H. Snoussi “A dependent multi label classification method derived from the k- nearest neighbor rule”, EURASIP Journal on Advances in Signal Processing, 1-14 , 2011
W.Tao, W., & D.Chang “ News text classification based on an improved convolutional neural network” Tehni?ki vjesnik, 26(5), 1400-1409 , 2019
A.Tripathy, and A. Anand and S.K. Rath “Document-level sentiment classification using hybrid machine learning approach”, Knowledge and Information Systems, 53(3), 805-831, 2017
A. Bakliwal et.al “sentiment analysis of political tweets” , Towards an accurate classifier. Association for Computational Linguistics, 49-58, 2013
J. Fan, and Y. Keny. “High dimensional classification using features annealed independence rules”, Annals of statistics, 36(6), 2605, 2008
M. Qiu et al), “Convolutional-neural-network-based Multilabel Text Classification for Automatic Discrimination of Legal Documents” Sensors and Materials, 32(8), 2659-2672, 2020
W. Kelemework “Automatic Amharic text news classification”, A neural networks approach.” Ethiopian Journal of Science and Technology, 6(2), 127-137, 2013
B.Jang, I. Kim, and J.W. Kim “Word2vec convolutional neural networks for classification of news articles and tweets” PloS one, 14(8), e0220976, 2019
A.Diriba, “Automatic classification of Afaan Oromo News Text: The Case of Radio Fana”, etd.aau.edu.et/handle/123456789/21309, 2009
N. Kannaiya, M. Raja, and N.Bakala “Multi-Label Classification for Afan Oromo Text by using Python Machine Learning Tools”, IJESC, Volume 10 Issue No.4, 2020
H. Wubalem “Multi Label Amharic Text Classification Using Convolutional Neural Network Approaches Doctoral dissertation” IJESC, 2020
C. Saunders, M. O.Stitson, J. Weston, R. Holloway, and L. Bottou “Support vector machine. Computer science” 1, 1-28, 2020
J. Quinlan, “Induction of decision trees. Machine learning” ,1986
P. Liu, X. Qiu, X and X. Huang, “Recurrent neural network for text classification with multi-task learning” arXiv preprint arXiv: 1605.05101, 2016
A. Graves, A. Mohamed and G. Hinton, “Speech recognition with deep recurrent neural networks”, In 2013 IEEE international conference on acoustics, speech and signal processing , pp. 6645-6649, 2013
D. Scherer, A. Müller and S. Behnke, “Evaluation of pooling operations in convolutional architectures for object recognition In International conference on artificial neural networks Springer, Heidelberg”, 2010
R. Bulo ,“Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach”, International journal of research creativity and research ,2023
K.M.Jimalo, R.B. Putchanuthala and Y.Assabie “ Afaan Oromo News Text Categorization using Decision Tree Classifier and Support Vector Machine: A Machine Learning Approach”,2017
F. Van Meeuwen, “Multi-label text classification of news articles for ASDMedia,” 2013.
B. Shruti, and, and G. Vishal, “Text News Classification System using Naïve Bayes Classifer,” vol. 3, pp. 209–213, 2014.
N. Bakala, “A Two Steps Approach for Afan Oromo Nonfiction Text Categorization,” IJSRCSEIT , vol. 3, pp.
–120, 2018.
DOI: https://doi.org/10.52088/ijesty.v5i4.987
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Diriba Gichile Rundasa, Arulmurugan Ramu



























