Comparison of Support Vector Machine and Naïve Bayes Algorithms Based on TF-IDF in Online Gambling Website Detection
Abstract
The rapid growth of digital technology has significantly accelerated the spread of illegal online content, particularly gambling websites, which threaten social stability and regulatory enforcement. To address this issue, this study develops an automated detection system for online gambling sites using text classification with the Term Frequency–Inverse Document Frequency (TF-IDF) approach. A total of 1,225 website URLs were collected through web scraping, and after preprocessing, 1,166 valid entries were manually labeled into two classes: gambling and normal. The preprocessing steps included cleaning, tokenizing, stopword removal, stemming, and domain parsing, followed by feature extraction using TF-IDF, which generated 2,426 numerical features. To mitigate class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training dataset. Two machine learning algorithms were implemented and compared: Support Vector Machine (SVM) with multiple kernels (Linear, RBF, Polynomial, and Sigmoid) and Multinomial Naïve Bayes (MNB). Experimental evaluation was conducted using accuracy, precision, recall, specificity, and F1-score metrics. Results demonstrate that SVM with the RBF kernel achieved the best performance, with an accuracy of 91.88% and an F1-score of 93.70%, while MNB obtained an accuracy of 88.46% and an F1-score of 91.00%. These findings confirm that SVM, particularly with the RBF kernel, delivers more stable and accurate performance in distinguishing gambling websites from normal ones. The proposed system offers a reliable foundation for the development of automated tools to monitor, detect, and block illegal online gambling content, thereby supporting regulatory enforcement and reducing the negative societal impacts of online gambling.
Keywords
References
M. R. Ardiansyah, K. Sudarmanto, K. Sukarna, and Z. Arifin, “Efektivitas Pemberantasan Tindak Pidana Judi Online,” Journal Juridisch, vol. 1, no. 3, pp. 183–191, Dec. 2023, doi: 10.26623/JJ.V1I3.7946.
N. F. Juhara, M. Amalia, and A. Mulyana, “Efektivitas Penegakan Hukum terhadap Judi Online di Indonesia: Analisis Yuridis dan Sosiologis,” Journal of Contemporary Law Studies, vol. 2, no. 2, pp. 153–164, Feb. 2025, doi: 10.47134/LAWSTUDIES.V2I2.3353.
N. Javier, B. D. Satoto, and Y. D. P. Negara, “Implementasi Teknik Web Scraping Untuk Pengumpulan Data Laporan Keuangan Perusahaan di Bursa Efek Indonesia (IDX),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 2, pp. 2789–2795, Mar. 2025, doi: 10.36040/JATI.V9I2.13070.
S. Hadijah, K. Auliasari, and F. X. Ariwibisono, “Peramalan Harga Saham Menggunalan Metode Simple Moving Average dan Web Scraping,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 2, pp. 1271–1278, Apr. 2024, doi: 10.36040/JATI.V8I2.9090.
T. T. Widowati and M. Sadikin, “Analisis Sentimen Twitter terhadap Tokoh Publik dengan Algoritma Naive Bayes dan Support Vector Machine,” Simetris: Jurnal Teknik Mesin, Elektro dan Ilmu Komputer, vol. 11, no. 2, pp. 626–636, Oct. 2020, doi: 10.24176/SIMET.V11I2.4568.
M. N. Muttaqin and I. Kharisudin, “Analisis sentimen aplikasi gojek menggunakan support vector machine dan k nearest neighbor,” Unnes Journal of Mathematics, vol. 10, no. 2, pp. 22–27, Dec. 2021, doi: 10.15294/UJM.V10I2.48474.
A. Nurkholis, D. Alita, and A. Munandar, “Comparison of Kernel Support Vector Machine Multi-Class in PPKM Sentiment Analysis on Twitter,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 2, pp. 227–233, Apr. 2022, doi: 10.29207/RESTI.V6I2.3906.
S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Perbandingan Evaluasi Kernel SVM untuk Klasifikasi Sentimen dalam Analisis Kenaikan Harga BBM,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 2, pp. 153–160, Oct. 2023, doi: 10.57152/malcom.v3i2.897.
Cosmas Haryawan and Yosef Muria Kusuma Ardhana, “Analisa Perbandingan Teknik Oversampling SMOTE Pada Data Imbalanced,” Jurnal Informatika dan Rekayasa Elektronik, vol. 6, no. 1, pp. 73–78, Apr. 2023, doi: 10.36595/jire.v6i1.834.
K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/J.GLTP.2022.04.020.
F. Rumaisa, Y. Puspitarani, A. Rosita, A. Zakiah, and S. Violina, “Penerapan Natural Language Processing (NLP) di bidang pendidikan,” Jurnal Inovasi Masyarakat, vol. 1, no. 3, pp. 232–235, Dec. 2021, doi: 10.33197/JIM.VOL1.ISS3.2021.799.
N. Afifah, D. Stiawan, and A. Bardadi, “Perbandingan Kinerja Kernel RBF dan Linear pada Algoritma Support Vector Machine (SVM) untuk Prediksi Serangan Ransomware Locker,” JSI: Jurnal Sistem Informasi (E-Journal), vol. 15, no. 1, pp. 3118–3123, Apr. 2023, doi: 10.18495/jsi.v15i1.113.
R. Chandra and E. M. Sipayung, “Analisis Sentimen Ulasan Aplikasi Samsat Digital Nasional Menggunakan Algoritma Naive Bayes Classifier,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 10, no. 3, pp. 156–164, Jan. 2024, doi: 10.25077/TEKNOSI.V10I3.2024.156-164.
G. Wijaya, D. Irawan, Z. Arifin, H. Oktavianto, M. Rahman, and G. Abdurrahman, “Studi Klasifikasi Topik Berita Dengan Algoritma Machine Learning,” J-ENSITEC, vol. 11, no. 01, pp. 10202–10206, Dec. 2024, doi: 10.31949/JENSITEC.V11I01.12037.
D. Septiani and I. Isabela, “SINTESIA: Analisa Term Frequency Inverse Document Frequency (TF-IDF) Dalama Temu Kembali Informasi Pada Dokumen Teks,” Jurnal Sistem dan Teknologi Informasi Indonesia, 2022.
A. R. Hanum et al., “Analisis Kinerja Algoritma Klasifikasi Teks Bert dalam Mendeteksi Berita Hoaks,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 3, pp. 537–546, Jul. 2024, doi: 10.25126/JTIIK.938093.
M. Adam Rachman, E. Dyar Wahyuni Sistem Informasi, U. Pembangunan Nasional, J. Timur Jl Raya Rungkut Madya, and G. Anyar, “Komprasi Performa Model Klasifikasi Emosi Dengan Word Embedding Menggunakan Algoritma SVM dan Random Forest,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 2, pp. 2872–2878, Mar. 2025, doi: 10.36040/JATI.V9I2.13197.
M. Afriansyah, J. Saputra, V. Yoga Pudya Ardhana, Y. Sa, and U. Qamarul Huda Badaruddin, “Algoritma Naïve Bayes Yang Efisien Untuk Klasifikasi Buah Pisang Raja Berdasarkan Fitur Warna,” Journal of Information Systems Management and Digital Business, vol. 1, no. 2, pp. 236–248, Jan. 2024, doi: 10.59407/JISMDB.V1I2.438.
S. A. Helmayanti, F. Hamami, and R. Y. Fa’rifah, “Penerapan Algoritma TF-IDF Dan Naive Bayes Untuk Analisis Sentimen Berbasis Aspek Ulasan Aolikasi Flip pada Google Play Store,” Jurnal Indonesia : Manajemen Informatika dan Komunikasi, vol. 4, no. 3, pp. 1822–1834, Sep. 2023, doi: 10.35870/JIMIK.V4I3.415.
J. T. Kumalasari and I. Puspitorini, “Perbandingan Metode Klasifikasi dan SMOTE Terhadap Analisa Sentimen Mobil Listrik Indonesia,” Jurnal Minfo Polgan, vol. 13, no. 2, pp. 2257–2268, Jan. 2025, doi: 10.33395/jmp.v13i2.14428.
I. Fadhli, Zulfaneti, and S. Junaidi, “Pengembangan Chatbot Informasi Seputar Nagari Berbasis Telegram Dengan Metode Natural Language Processing,” Ekasakti Engineering Journal , vol. 4, no. 2, pp. 69–75, 2024, Accessed: Apr. 16, 2025. [Online]. Available: https://journal.unespadang.ac.id/EEJ/article/view/389
DOI: https://doi.org/10.52088/ijesty.v6i1.1794
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Rina Refianti, Husein Alhafiz



























