International Journal of Mathematical, Engineering and Management Sciences

ISSN: 2455-7749

A Classification System for Diabetic Patients with Machine Learning Techniques

A Classification System for Diabetic Patients with Machine Learning Techniques

Vandana Rawat
Department of Computer Applications, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India.

IFP Energies Nouvelles (IFPEN), Lyon, France.

DOI https://dx.doi.org/10.33889/IJMEMS.2019.4.3-057

Received on September 21, 2018
Accepted on April 04, 2019


Diabetes mellitus (DM) is a group of metallic disorder characterized by steep levels of blood glucose prolonged over a time. It results the defection in insulin production or improper action of the cells to the insulin produced. It is one of the significant public health care challenge worldwide. Diabetes exists in a body when pancreas does not construct enough hormone insulin or the human body is not being able to use the insulin properly. The diagnosis of diabetes (diagnosis, etiopathophysiology, therapy etc.) need to generate and process the vast amount of data. Data mining techniques have proven its usefulness and effectiveness in order to evaluate the unknown relationships or patterns if exists with such vast data. In the present work, five techniques based on machine learning namely, AdaBoost, LogicBoost, RobustBoost, Naïve Bayes and Bagging have been proposed for the analysis and prediction of DM patients. The proposed techniques are employed on the data set of Pima Indians Diabetes patients. The results computed are found to be very accurate with classification accuracy of 81.77% and 79.69% by bagging and AdaBoost techniques, respectively. Hence, the proposed techniques employed here are highly adorable, effective and efficient in order to predict the DM.

Keywords- Bagging, Boosting techniques, Diabetes mellitus (DM), Machine learning techniques, Naive Bayes Classifier, RobustBoost techniques, Prediction.


Rawat, V., & Suryakan (2019). A Classification System for Diabetic Patients with Machine Learning Techniques. International Journal of Mathematical, Engineering and Management Sciences, 4(3), 729-744. https://dx.doi.org/10.33889/IJMEMS.2019.4.3-057.

Conflict of Interest

The authors confirm that there is no conflict of interest to declare for this publication.


The authors would like to express their sincere thanks to the Graphic Era Deemed to be University for providing the resources and support to complete this paper.


Acharjya, D., & Anitha, A. (2017). A comparative study of statistical and rough computing models in predictive data analysis. International Journal of Ambient Computing and Intelligence, 8(2), 32-51.

Alberti, K.G. M.M., & Zimmet, P.F. (1998). Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO consultation. Diabetic Medicine, 15(7), 539-553.

Al-Goblan, A.S., Al-Alfi, M.A., & Khan, M.Z. (2014). Mechanism linking diabetes mellitus and obesity. Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy, 7, 587-591

Baig, M.M., Awais, M.M., & El-Alfy, E.S.M. (2017). AdaBoost-based artificial neural network learning. Neurocomputing, 248, 120-126.

Cooper, H.C., Booth, K., & Gill, G. (2003). Patients’ perspectives on diabetes health care education. Health Education Research, 18(2), 191-206.

Cusumano-Towner, M. (2012). Boosting with log-loss. Link: https://pdfs.semanticscholar.org/1b3f/cf95f1f5450aa676bd6935851dd8dc121afd.pdf

de Sá, A.G., Pereira, A.C., & Pappa, G.L. (2018). A customized classification algorithm for credit card fraud detection. Engineering Applications of Artificial Intelligence, 72, 21-29.

Dwivedi, A.K. (2018). Analysis of computational intelligence techniques for diabetes mellitus prediction. Neural Computing and Applications, 30(12), 3837–3845.

Dwivedi, A.K., & Chouhan, U. (2018). Comparative study of artificial neural network for classification of hot and cold recombination regions in Saccharomyces cerevisiae. Neural Computing and Applications, 29(2), 529-535.

Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A., & Strachan, R. (2014). Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4), 1937-1946.

Farvaresh, H., & Sepehri, M.M. (2011). A data mining framework for detecting subscription fraud in telecommunication. Engineering Applications of Artificial Intelligence, 24(1), 182-194.

Freund, Y., & Schapire, R.E. (1996, July). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference (Vol. 96, pp. 148-156).

Freund, Y., Schapire, R., & Abe, N. (1999). A short introduction to boosting. Journal-Japanese Society for Artificial Intelligence, 14(5) (771-780), 1612.

Gong, J., & Kim, H. (2017). RHSBoost: Improving classification performance in imbalance data. Computational Statistics & Data Analysis, 111, 1-13.

Guo, Y., Bai, G., & Hu, Y. (2012, December). Using Bayes network for prediction of type-2 diabetes. In 2012 International Conference for Internet Technology and Secured Transactions (pp. 471-472). IEEE.

Hoshi, K., Kawakami, J., Kumagai, M., Kasahara, S., Nishimura, N., Nakamura, H., & Sato, K. (2005). An analysis of thyroid function diagnosis using Bayesian-type and SOM-type neural networks. Chemical and Pharmaceutical Bulletin, 53(12), 1570-1574.

Hui, S.C., He, Y., & Thach, D.T.C. (2007, December). Machine learning for tongue diagnosis. In 2007 6th International Conference on Information, Communications & Signal Processing (pp. 1-5). IEEE.

Jiang, L., Li, C., Wang, S., & Zhang, L. (2016). Deep feature weighting for naive Bayes and its application to text classification. Engineering Applications of Artificial Intelligence, 52, 26-39.

Kalaiselvi, C., & Nasira, G.M. (2014, February). A new approach for diagnosis of diabetes and prediction of cancer using ANFIS. In 2014 World Congress on Computing and Communication Technologies (pp. 188-190). IEEE.

Kalaiselvi, C., & Nasira, G.M. (2015). Prediction of heart diseases and cancer in diabetic patients using data mining techniques. Indian Journal of Science and Technology, 8(14), 1-7

Karegowda, A.G., Manjunath, A.S., & Jayaram, M.A. (2011). Application of genetic algorithm optimized neural network connection weights for medical diagnosis of pima Indians diabetes. International Journal on Soft Computing, 2(2), 15-23.

Kayaer, K., & Yildirim, T. (2003, June). Medical diagnosis on Pima Indian diabetes using general regression neural networks. In Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP) (pp. 181-184).

King, H., Aubert, R.E., & Herman, W.H. (1998). Global burden of diabetes, 1995–2025: prevalence, numerical estimates, and projections. Diabetes Care, 21(9), 1414-1431.

Kukar, M., Groselj, C., Kononenko, I., & Fettich, J.J. (1997, June). An application of machine learning in the diagnosis of ischaemic heart disease. In Proceedings of Computer Based Medical Systems (pp. 70-75). IEEE.

Li, X., Wang, L., & Sung, E. (2008). AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence, 21(5), 785-795.

Mercaldo, F., Nardone, V., & Santone, A. (2017). Diabetes mellitus affected patients classification and diagnosis through machine learning techniques. Procedia Computer Science, 112, 2519-2528.

Palivela, H., Yogish, H.K., Vijaykumar, S., & Patil, K. (2013, February). Survey on mining techniques for breast cancer related data. In 2013 International Conference on Information Communication and Embedded Systems (ICICES) (pp. 540-546). IEEE.

Parthiban, G., & Srivatsa, S.K. (2012). Applying machine learning methods in diagnosing heart disease for diabetic patients. International Journal of Applied Information Systems, 3(7), 2249-0868.

Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115-121.

Polat, K., & Güneş, S. (2007). An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digital Signal Processing, 17(4), 702-710.

Polat, K., Güneş, S., & Arslan, A. (2008). A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications, 34(1), 482-487.

Pouya, O.R. (2016, May). A new margin-based AdaBoost algorithm: Even more robust than RobustBoost to class-label noise. In 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1-5). IEEE.

Purnami, S.W., Zain, J.M., & Embong, A. (2010, March). A new expert system for diabetes disease diagnosis using modified spline smooth support vector machine. In International Conference on Computational Science and Its Applications (pp. 83-92). Springer, Berlin, Heidelberg.

Rasooly, R.S., Akolkar, B., Spain, L.M., Guill, M.H., Del Vecchio, C.T., & Carroll, L.E. (2015). The national institute of diabetes and digestive and kidney diseases central repositories: a valuable resource for nephrology research. Clinical Journal of the American Society of Nephrology, 10(4), 710-715.

Sharma, K., & Virmani, J. (2017). A decision support system for classification of normal and medical renal disease using ultrasound images: A decision support system for medical renal diseases. International Journal of Ambient Computing and Intelligence, 8(2), 52-69.

Temurtas, H., Yumusak, N., & Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with Applications, 36(4), 8610-8615.

Thongkam, J., Xu, G., & Zhang, Y. (2008, June). AdaBoost algorithm with random forests for predicting breast cancer survivability. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 3062-3069). IEEE.

Thongkam, J., Xu, G., Zhang, Y., & Huang, F. (2008, January). Breast cancer survivability via AdaBoost algorithms. In Proceedings of the second Australasian workshop on Health data and knowledge management-Volume 80 (pp. 55-64). Australian Computer Society, Inc..

Torrent-Fontbona, F. (2018). Adaptive basal insulin recommender system based on Kalman filter for type 1 diabetes. Expert Systems with Applications, 101, 1-7.

Tu, M.C., Shin, D., & Shin, D. (2009, December). A comparative study of medical data classification methods based on decision tree and bagging algorithms. In 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing (pp. 183-187). IEEE.

Varma, R., Bressler, N.M., Doan, Q.V., Gleeson, M., Danese, M., Bower, J.K., & Turpcu, A. (2014). Prevalence of and risk factors for diabetic macular edema in the United States. JAMA Ophthalmology, 132(11), 1334-1340.

Velu, C.M., & Kashwan, K.R. (2013, February). Multi-Level counter propagation network for diabetes classification. In 2013 International Conference on Signal Processing, Image Processing & Pattern Recognition (pp. 190-194). IEEE.

Venema, V. (2016). Non-Convex potential function boosting versus noise peeling:-a comparative study. (Dissertation). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-302289.

Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., & Jatkoe, T. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet, 365(9460), 671-679.

Whetzel, P.L., Grethe, J.S., Banks, D.E., & Martone, M.E. (2015). The NIDDK Information Network: a community portal for finding data, materials, and tools for researchers studying diabetes, digestive, and kidney diseases. PloS one, 10(9), e0136206.

Xie, J., Liu, Y., Zeng, X., Zhang, W., & Mei, Z. (2017). A Bayesian network model for predicting type 2 diabetes risk based on electronic health records. Modern Physics Letters B, 31(19-21), 1740055.

Zhang, W., Zeng, F., Wu, X., Zhang, X., & Jiang, R. (2009, August). A comparative study of ensemble learning approaches in the classification of breast cancer metastasis. In 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (pp. 242-245). IEEE.