Tanujit Chakraborty
SQC & OR Unit Indian Statistical Institute, Kolkata, 700108, India.
DOI https://dx.doi.org/10.33889/IJMEMS.2019.4.4-068
Abstract
Private business schools in India face a regular problem of picking quality students for their MBA programs to achieve the desired placement percentage. Generally, such datasets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble classifier which can handle the imbalanced nature of the dataset and achieves higher accuracy in case of the feature selection (selection of important characteristics of students) cum classification problem (prediction of placements based on the students’ characteristics) for Indian business school dataset. The optimal value of an important model parameter is found. Experimental evidence is also provided using Indian business school dataset to evaluate the outstanding performance of the proposed imbalanced ensemble classifier.
Keywords- Business school problem, Imbalanced data, Hellinger distance, Ensemble classifier.
Citation
Chakraborty, T. (2019). Imbalanced Ensemble Classifier for Learning from Imbalanced Business School Dataset. International Journal of Mathematical, Engineering and Management Sciences, 4(4), 861-869. https://dx.doi.org/10.33889/IJMEMS.2019.4.4-068.
Conflict of Interest
The author declares that there is no conflict of interest for this publication.
Acknowledgements
The author would like to express his sincere thanks to the referees and editor for their valuable suggestions towards the improvement in the quality of the paper.
References
Barron, A.R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930-945.
Chakraborty, T., Chakraborty, A.K., & Murthy, C.A. (2019). A nonparametric ensemble binary classifier and its statistical properties. Statistics & Probability Letters, 149, 16-23.
Chakraborty, T., Chattopadhyay, S., & Chakraborty, A.K. (2018). A novel hybridization of classification trees and artificial neural networks for selection of students in a business school. Opsearch, 55(2), 434-446.
Cieslak, D.A., Hoens, T.R., Chawla, N.V., & Kegelmeyer, W.P. (2012). Hellinger distance decision trees are robust and skew-insensitive. Data Mining and Knowledge Discovery, 24(1), 136-158.
Devroye, L., Györfi, L., & Lugosi, G. (2013). A probabilistic theory of pattern recognition (Vol. 31). Springer Science & Business Media.
Faragó, A., & Lugosi, G. (1993). Strong universal consistency of neural network classifiers. IEEE Transactions on Information Theory, 39(4), 1146-1151.
Györfi, L., Kohler, M., Krzyzak, A., & Walk, H. (2006). A distribution-free theory of nonparametric regression. Springer Science & Business Media.
Liu, W., Chawla, S., Cieslak, D.A., & Chawla, N.V. (2010). A robust decision tree algorithm for imbalanced data sets. In Proceedings of the 2010 SIAM International Conference on Data Mining (pp. 766-777). Society for Industrial and Applied Mathematics.
Lugosi, G., & Zeger, K. (1995). Nonparametric estimation via empirical risk minimization. IEEE Transactions on information theory, 41(3), 677-687.
Rao, C.R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qüestiió: quaderns d'estadística i investigació operativa, 19(1-3), 23-63.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Su, C., Ju, S., Liu, Y., & Yu, Z. (2015). Improving random forest and rotation forest for highly imbalanced datasets. Intelligent Data Analysis, 19(6), 1409-1432.
Wang, L., & Alexander, C.A. (2016). Machine learning in big data. International Journal of Mathematical, Engineering and Management Sciences, 1(2), 52-61.