Syaiful Anam
Mathematics Department, Brawijaya University, Malang, East Java, Indonesia.
Dian Eka Ratnawati
Informatics Engineering Department, Brawijaya University, Malang, East Java, Indonesia.
Satrio Hadi Wijoyo
Information System Department, Brawijaya University, Malang, East Java, Indonesia.
Nathanael Jeshua Paat
Mathematics Department, Brawijaya University, Malang, East Java, Indonesia.
DOI https://doi.org/10.33889/IJMEMS.2026.11.2.034
Abstract
One of the greatest challenges in medicine is the early prediction of Diabetes Mellitus (DM). This difficulty is due to the numerous clinical factors that influence prediction and the limited generalizability of data models in predicting. This issue is addressed in this study, which proposes a data prediction framework that utilizes a Random Forest Classifier and Binary Grey Wolf Optimization–based Feature Selection (RFC with BGWO-FS) along with hyperparameter tuning. This study includes extensive validation of the proposed methodology using two independent datasets with distinct clinical features. This work provided a full test of the model’s stability, generalizability, and clinical applicability. Within both datasets, the proposed RFC with BGWO-FS achieved test accuracies of 78.2% and 78.8% and F1 scores of 75.3% and 70.0%, which are the highest from any other methods. The paired t-tests further confirm the significance of the computational time reductions over RFC with GA-FS and RFC with BPSO-FS. There is also no difference in the stability of the output from differing RFC with BGWO-FS configurations. Of the approaches analyzed, Grid Search had the highest stability in terms of generalization, while Bayesian Optimization did so with the least computation overhead. The clinically relevant features of general health, hypertension, cholesterol, glucose, and BMI selected by SHAP confirmed the relevance of the features. The results also demonstrated that across both datasets, RFC with BGWO-FS is accurate and generalizable. The results of this analysis should support the integration of RFC and BGWO-FS into diabetes screening workflows and clinical decision support systems.
Keywords- Diabetes mellitus prediction, Feature selection, Random forest classifier, Grey wolf optimization.
Citation
Anam, S., Ratnawati, D. E. Wijoyo, S. H. & Paat, N. J (2026). Random Forest Classifier with Binary Grey Wolf Optimization Feature Selection for Predicting Diabetes Mellitus. International Journal of Mathematical, Engineering and Management Sciences, 11(2), 804-830. https://doi.org/10.33889/IJMEMS.2026.11.2.034.