Smart Campus Dropout Prediction: Hybrid Features and Ensemble Approach
Abstract
The issue of the high number of students dropping out of college is a major concern in higher education, especially in the smart campus ecosystem. This research aims to design a prediction system for students who are at risk of dropping out by integrating hybrid feature selection methods and ensemble learning that leverage academic data and students' digital footprints. The initial process of model development involves data cleaning and the selection of important features through a combination approach using filter-based methods (mutual information) and recursive feature elimination. A classification model is then designed using the XGBoost and Random Forest algorithms. The testing was conducted using a secondary dataset that included variables such as participation in discussions, attendance rates, interaction with learning materials, and academic achievement. The results of testing with the XGBoost model showed a satisfactory accuracy level, with an F1 score of 0.77 and a ROC AUC of 0.89. The confusion matrix recorded 67 correct predictions for students who graduated and 17 correct predictions for students who dropped out, with a total of 12 misclassifications. These findings suggest that the combination of hybrid feature selection strategies and XGBoost can produce sufficiently accurate predictions of student dropouts and has the potential to be utilized as an early warning system in the governance of a more flexible and responsive smart campus.
Keywords
References
Nurmalitasari, Z. Awang Long, and M. Faizuddin Mohd Noor, “Factors Influencing Dropout Students in Higher Education,” Educ. Res. Int., vol. 2023, 2023, doi: 10.1155/2023/7704142.
S. A. Khairullah, S. Harris, H. J. Hadi, R. A. Sandhu, N. Ahmad, and M. A. Alshara, “Implementing artificial intelligence in academic and administrative processes through responsible strategic leadership in the higher education institutions,” Front. Educ., vol. 10, 2025, doi: 10.3389/feduc.2025.1548104.
O. Jimenez, A. Jesús, and L. Wong, Model for the Prediction of Dropout in Higher Education in Peru applying Machine Learning Algorithms: Random Forest, Decision Tree, Neural Network and Support Vector Machine, vol. 33. 2023. doi: 10.23919/FRUCT58615.2023.10143068.
S. A. Sulak and N. Koklu, “Predicting Student Dropout Using Machine Learning Algorithms,” PLUSBASE Akad. Organ. VE DANISMANLIK LTD STI, vol. 3, pp. 91–98, Sep. 2024, doi: 10.58190/imiens.2024.103.
A. D. Riyanto, A. A. Pratiwi, C. S. Faculty, U. A. Purwokerto, C. S. Faculty, and U. A. Purwokerto, “ANALYSIS OF FACTORS DETERMINING STUDENT SATISFACTION USING DECISION TREE , RANDOM FOREST , SVM , AND NEURAL NETWORKS : A ANALISIS FAKTOR PENENTU KEPUASAN MAHASISWA MENGGUNAKAN DECISION TREE , RANDOM FOREST , SVM , DAN NEURAL NETWORKS : SEBUAH STUDI KOMPAR,” vol. 5, no. 4, pp. 187–196, 2024.
P. Rani, A. Jain, and S. K. Chawla, “A Hybrid Approach for Feature Selection Based on Genetic Algorithm and Recursive Feature Elimination,” Int. J. Inf. Syst. Model. Des., vol. 12, pp. 17–38, Apr. 2021, doi: 10.4018/IJISMD.2021040102.
N. S. Mohd Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, no. Ml, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001.
L. Shrivastav and R. Kumar, “An Ensemble of Random Forest Gradient Boosting Machine and Deep Learning Methods for Stock Price Prediction,” J. Inf. Technol. Res., vol. 15, pp. 1–19, Jan. 2022, doi: 10.4018/JITR.2022010102.
E. M. Ferrouhi and I. Bouabdallaoui, “A comparative study of ensemble learning algorithms for high-frequency trading,” Sci. African, vol. 24, p. e02161, 2024, doi: https://doi.org/10.1016/j.sciaf.2024.e02161.
M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels,” Technologies, vol. 13, no. 3, pp. 1–40, 2025, doi: 10.3390/technologies13030088.
Tao-Hongli, “Educational data mining for student performance prediction: feature selection and model evaluation,” J. Electr. Syst., vol. 20, pp. 1063–1074, Apr. 2024, doi: 10.52783/jes.3434.
L. Sun et al., “A Hybrid Feature Selection Framework Using Improved Sine Cosine Algorithm with Metaheuristic Techniques,” Energies, vol. 15, no. 10, pp. 1–24, 2022, doi: 10.3390/en15103485.
A. Roman, M. M. Rahman, S. A. Haider, T. Akram, and S. R. Naqvi, “Integrating Feature Selection and Deep Learning: A Hybrid Approach for Smart Agriculture Applications,” Algorithms, vol. 18, no. 4, pp. 1–26, 2025, doi: 10.3390/a18040222.
M. Chaudhry, I. Shafi, M. Mahnoor, D. L. Vargas, E. B. Thompson, and I. Ashraf, “A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective,” Symmetry, vol. 15, no. 9. 2023. doi: 10.3390/sym15091679.
A. Yavuz Ozalp, H. Akinci, and M. Zeybek, “Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey,” Water, vol. 15, no. 14. 2023. doi: 10.3390/w15142661.
J. Mark et al., “Performance evaluation of random forest algorithm for automating classification of mathematics question items,” World J. Adv. Res. Rev., vol. 18(02), pp. 34–43, Apr. 2023, doi: 10.30574/wjarr.2023.18.2.0762.
C. Starbuck, “Linear Regression BT - The Fundamentals of People Analytics: With Applications in R,” C. Starbuck, Ed., Cham: Springer International Publishing, 2023, pp. 181–206. doi: 10.1007/978-3-031-28674-2_10.
I. Cherif and A. Kortebi, On using eXtreme Gradient Boosting (XGBoost) Machine Learning algorithm for Home Network Traffic Classification. 2019. doi: 10.1109/WD.2019.8734193.
G. Huang, Z. Liu, Y. Wang, and Y. Yang, “A Multi-Objective Prediction XGBoost Model for Predicting Ground Settlement, Station Settlement, and Pit Deformation Induced by Ultra-Deep Foundation Construction,” Buildings, vol. 14, no. 9. 2024. doi: 10.3390/buildings14092996.
A. Mehdary, A. Chehri, A. Jakimi, and R. Saadane, “Hyperparameter Optimization with Genetic Algorithms and XGBoost: A Step Forward in Smart Grid Fraud Detection,” Sensors, vol. 24, no. 4. 2024. doi: 10.3390/s24041230.
N. Linh et al., “Flood susceptibility modeling based on new hybrid intelligence model: Optimization of XGboost model using GA metaheuristic algorithm,” Adv. Sp. Res., vol. 69, Feb. 2022, doi: 10.1016/j.asr.2022.02.027.
E. Hancer, “An improved evolutionary wrapper-filter feature selection approach with a new initialisation scheme,” Mach. Learn., vol. 113, no. 8, pp. 4977–5000, 2024, doi: 10.1007/s10994-021-05990-z.
M. Beraha, A. M. Metelli, M. Papini, A. Tirinzoni, and M. Restelli, Feature Selection via Mutual Information: New Theoretical Insights. 2019. doi: 10.1109/IJCNN.2019.8852410.
K. Robindro, U. B. Clinton, N. Hoque, and D. K. Bhattacharyya, “JoMIC: A joint MI-based filter feature selection method,” J. Comput. Math. Data Sci., vol. 6, p. 100075, 2023, doi: https://doi.org/10.1016/j.jcmds.2023.100075.
S.-A. Amamra, “Random Forest-Based Machine Learning Model Design for 21,700/5 Ah Lithium Cell Health Prediction Using Experimental Data,” Physchem, vol. 5, no. 1. 2025. doi: 10.3390/physchem5010012.
G. N., P. Jain, A. Choudhury, P. Dutta, K. Kalita, and P. Barsocchi, “Random Forest Regression-Based Machine Learning Model for Accurate Estimation of Fluid Flow in Curved Pipes,” Processes, vol. 9, no. 11. 2021. doi: 10.3390/pr9112095.
H. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian J. Mach. Learn., vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/BJML/2024/007.
M. W. Dwinanda, N. Satyahadewi, and W. Andani, “Classification of Student Graduation Status Using Xgboost Algorithm,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 3, pp. 1785–1794, 2023, doi: 10.30598/barekengvol17iss3pp1785-1794.
M. Wiens, A. Verone?Boyle, N. Henscheid, J. Podichetty, and J. Burton, “A Tutorial and Use Case Example of the eXtreme Gradient Boosting (XGBoost) Artificial Intelligence Algorithm for Drug Development Applications,” Clin. Transl. Sci., vol. 18, Mar. 2025, doi: 10.1111/cts.70172.
C. Hong and T. Oh, “TPR-TNR plot for confusion matrix,” Commun. Stat. Appl. Methods, vol. 28, pp. 161–169, Mar. 2021, doi: 10.29220/CSAM.2021.28.2.161.
M. Safii, S. Efendi, M. Zarlis, and H. Mawengkang, “Intelligent evacuation model in disaster mitigation,” Bull. Electr. Eng. Informatics, vol. 11, no. 4, pp. 2204–2214, 2022, doi: 10.11591/eei.v11i4.3805.
S. Swaminathan and B. R. Tantri, “Confusion Matrix-Based Performance Evaluation Metrics,” African J. Biomed. Res., vol. 27, pp. 4023–4031, Nov. 2024, doi: 10.53555/AJBR.v27i4S.4345.
DOI: https://doi.org/10.52088/ijesty.v5i4.1183
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 M Safii, Adli Abdillah Nababan, Husain Husain




























