Online Newspaper Clustering in Aceh using the Agglomerative Hierarchical Clustering Method

Rizal Tjut Adek, Rozzy Kesuma Dinata, Ananda Ditha

Abstract


The rapid progress in the field of information technology, especially the internet, has given birth to a lot of information. The ease of publishing an article on a website causes an explosion of news pages which will certainly confuse readers. The diversity and the increasing number of news articles make it increasingly difficult for internet users to find news and large piles of news data on online newspaper sites in Aceh. The grouping of text documents is needed to classify news in online newspapers in Aceh based on the content contained in news articles. In this study, the process of grouping online news in Aceh was tried using the Agglomerative Hierarchical Clustering method. News is grouped with a Bottom-Up design strategy that starts with placing each object as a cluster then combined into a larger cluster based on the similarity of keywords in each news, then the cluster results are compared and put into each news category. The research design was carried out in a structured manner using data flow diagrams in forming the research framework. The study was conducted by taking online news text data on 10 online news websites in Aceh from July 2016 to March 2017 with 1000 randomly generated documents. The process of crawling news data is done using a php script which will only take text files from the news on the website. News grouping is done based on religion, politics, law, sports, tourism, education, culture, economy and technology. The results of the grouping performance of the Agglomerative Hierarchical Clustering method in this study have an average accuracy of 89.84%.


Keywords


Document Clustering, Online News, Agglomerative Hierarchical Clustering

Full Text:

PDF

References


R. T. Adek, M. Fikry, and A. Helmina, “OPINION MINING ABOUT PARFUM ON E-COMMERCE BUKALAPAK.COM USING THE NAÃVE BAYES ALGORITHM,†JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 6, no. 1, pp. 107–114, 2020, doi: 10.33480/jitk.v6i1.1448.

M. D. Devika, C. Sunitha, and A. Ganesh, “Sentiment Analysis: A Comparative Study on Different Approaches,†Procedia Comput. Sci., vol. 87, pp. 44–49, 2016, doi: 10.1016/j.procs.2016.05.124.

K. Kim, O. joung Park, S. Yun, and H. Yun, “What makes tourists feel negatively about tourism destinations? Application of hybrid text mining methodology to smart destination management,†Technol. Forecast. Soc. Change, vol. 123, pp. 362–369, 2017, doi: 10.1016/j.techfore.2017.01.001.

D. Riyan Rizaldi, A. Doyan, Z. Fatimah, M. Zaenudin, and M. Zaini, “Strategies to Improve Teacher Ability in Using The Madrasah E-Learning Application During the COVID-19 Pandemic,†Int. J. Eng. Sci. Inf. Technol., vol. 1, no. 2, 2021, doi: 10.52088/ijesty.v1i2.47.

L. Oliveira and Ã. Figueira, “Benchmarking Analysis of Social Media Strategies in the Higher Education Sector,†Procedia Comput. Sci., vol. 64, pp. 779–786, 2015, doi: 10.1016/j.procs.2015.08.628.

R. T. Adek and M. Ula, “A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets,†in 2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA), 2020, pp. 19–27, doi: 10.1109/DATABIA50434.2020.9190436.

C. Shen and C.-J. Kuo, “Learning in massive open online courses: Evidence from social media mining,†Comput. Human Behav., vol. 51, pp. 568–577, 2015, doi: 10.1016/j.chb.2015.02.066.

S. Ali Rafsanjani, F. E. Rooslan Santosa, and R. Durrotun Nasihien, “Analysis of Planning for Clean Water Needs at Grand Sagara West Surabaya Hotel With the Green Buillding Concept,†Int. J. Eng. Sci. Inf. Technol., vol. 1, no. 2, 2021, doi: 10.52088/ijesty.v1i2.55.

T. Hachaj and M. R. Ogiela, “Clustering of trending topics in microblogging posts: A graph-based approach,†Futur. Gener. Comput. Syst., vol. 67, pp. 297–304, 2017, doi: 10.1016/j.future.2016.04.009.

J. S Pasaribu, “Development of a Web Based Inventory Information System,†Int. J. Eng. Sci. Inf. Technol., vol. 1, no. 2, 2021, doi: 10.52088/ijesty.v1i2.51.

R. Rinaldy and M. Ikhsan, “Determinant Analysis Of Conflict On Project Results In Aceh Province,†Int. J. Eng. Sci. Inf. Technol., vol. 1, no. 1, 2021, doi: 10.52088/ijesty.v1i1.37.

G. Vinodhini and R. M. Chandrasekaran, “A comparative performance evaluation of neural network based approach for sentiment classification of online reviews,†J. King Saud Univ. - Comput. Inf. Sci., vol. 28, no. 1, pp. 2–12, 2016, doi: 10.1016/j.jksuci.2014.03.024.

E. V. Kotelnikov and M. V. Pletneva, “Text sentiment classification based on a genetic algorithm and word and document co-clustering,†J. Comput. Syst. Sci. Int., vol. 55, no. 1, pp. 106–114, 2016, doi: 10.1134/S1064230715060106.

D. Tang, B. Qin, F. Wei, L. Dong, T. Liu, and M. Zhou, “A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification,†Audio, Speech, Lang. Process. IEEE/ACM Trans., vol. 23, no. 11, pp. 1750–1761, 2015, doi: 10.1109/TASLP.2015.2449071.

R. Gaspar, C. Pedro, P. Panagiotopoulos, and B. Seibt, “Beyond positive or negative: Qualitative sentiment analysis of social media reactions to unexpected stressful events,†Comput. Human Behav., vol. 56, pp. 179–191, 2016, doi: 10.1016/j.chb.2015.11.040.

A. Joshi, P. Bhattacharyya, and M. J. Carman, “Automatic Sarcasm Detection: A Survey,†ACM Comput. Surv., vol. 50, no. 5, 2016, doi: 10.1145/3124420.

A. Babour and J. I. Khan, “Tweet sentiment analytics with context sensitive tone-word lexicon,†Proc. - 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. - Work. WI-IAT 2014, vol. 1, pp. 26–34, 2014, doi: 10.1109/WI-IAT.2014.61.

M. Hu and B. Liu, “Mining and summarizing customer reviews,†Proc. 2004 ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’04, p. 168, 2004, doi: 10.1145/1014052.1014073.

R. T. Adek, Bustami, and M. Ula, “Systematics Review on the Application of Social Media Analytics for Detecting Radical and Extremist Group,†IOP Conf. Ser. Mater. Sci. Eng., vol. 1071, no. 1, p. 012029, Feb. 2021, doi: 10.1088/1757-899X/1071/1/012029.

K. Liu, L. Xu, and J. Zhao, “Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking,†Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. (Volume 1 Long Pap., pp. 314–324, 2014, doi: 10.1109/TKDE.2014.2339850.

A.-M. Popescu and O. Etzioni, “Extracting Product Features and Opinions from Reviews,†in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, pp. 339–346, doi: 10.3115/1220575.1220618.

N. Nurdin, R. T. Adek, and R. Rizwan, “PENDETEKSIAN DOKUMEN PLAGIARISME DENGAN MENGGUNAKAN METODE WEIGHT TREE,†J. Telemat., vol. 1, no. 1, pp. 31–45, 2019, doi: http://dx.doi.org/10.35671/telematika.v12i1.775.

X. Lv and N. El-Gohary, “Text Analytics for Supporting Stakeholder Opinion Mining for Large-scale Highway Projects,†Procedia Eng., vol. 145, pp. 518–524, 2016, doi: 10.1016/j.proeng.2016.04.039.

C. S. Rao and S. Viswanadha Raju, “Concurrent Information Retrieval System (IRS) for Large Volume of Data with Multiple Pattern Multiple ( $$2^mathrm{N}$$ 2 N ) Shaft Parallel String Matching,†Ann. Data Sci., vol. 3, no. 2, pp. 175–203, 2016, doi: 10.1007/s40745-016-0080-1.

F. L. Cruz, J. A. Troyano, B. Pontes, and F. J. Ortega, “Building layered, multilingual sentiment lexicons at synset and lemma levels,†Expert Syst. Appl., vol. 41, no. 13, pp. 5984–5994, 2014, doi: 10.1016/j.eswa.2014.04.005.

W. Hochwarter, “On the merits of student-recruited sampling: Opinions a decade in the making.,†J. Occup. Organ. Psychol., vol. 87, no. 1, pp. 27–33, Mar. 2014.




DOI: https://doi.org/10.52088/ijesty.v2i1.206

Article Metrics

Abstract view : 194 times
PDF - 108 times

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 Rizal Tjut Adek, Rozzy Kesuma Dinata, Ananda Ditha

International Journal of Engineering, Science and Information Technology (IJESTY) eISSN 2775-2674