IJMEMES logo

International Journal of Mathematical, Engineering and Management Sciences

ISSN: 2455-7749


Fine-Tuned Pre-Trained Model for Script Recognition

Fine-Tuned Pre-Trained Model for Script Recognition

Mamta Bisht
Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India.

Richa Gupta
Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India.

DOI https://doi.org/10.33889/IJMEMS.2021.6.5.078

Received on April 11, 2021
  ;
Accepted on July 25, 2021

Abstract

Script recognition is the first necessary preliminary step for text recognition. In the deep learning era, for this task two essential requirements are the availability of a large labeled dataset for training and computational resources to train models. But if we have limitations on these requirements then we need to think of alternative methods. This provides an impetus to explore the field of transfer learning, in which the previously trained model knowledge established in the benchmark dataset can be reused in another smaller dataset for another task, thus saving computational power as it requires to train only less number of parameters from the total parameters in the model. Here we study two pre-trained models and fine-tune them for script classification tasks. Firstly, the VGG-16 pre-trained model is fine-tuned for publically available CVSI-15 and MLe2e datasets for script recognition. Secondly, a well-performed model on Devanagari handwritten characters dataset has been adopted and fine-tuned for the Kaggle Devanagari numeral dataset for numeral recognition. The performance of proposed fine-tune models is related to the nature of the target dataset as similar or dissimilar from the original dataset and it has been analyzed with widely used optimizers.

Keywords- Transfer learning, Fine-tuning, Deep learning, CNN, VGG-16 model, Script classification.

Citation

Bisht, M., & Gupta, R. (2021). Fine-Tuned Pre-Trained Model for Script Recognition. International Journal of Mathematical, Engineering and Management Sciences, 6(5), 1297-1314. https://doi.org/10.33889/IJMEMS.2021.6.5.078.

Conflict of Interest

The authors declare that there is no conflict of interest regarding the publication of this work.

Acknowledgements

The authors are grateful to the editor and reviewers for their helpful suggestions.

References

Alabau, V., Sanchis, A., & Casacuberta, F. (2014). Improving on-line handwritten recognition in interactive machine translation. Pattern Recognition, 47(3), 1217–1228. Doi: 10.1016/j.patcog.2013.09.035.

Bhunia, A.K., Konwer, A., Bhunia, A.K., Bhowmick, A., Roy, P.P., & Pal, U. (2019). Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognition, 85, 172–184. Doi: 10.1016/j.patcog.2018.07.034.

Bisht, M., & Gupta, R. (2020). Multiclass recognition of offline handwritten Devanagari characters using CNN. International Journal of Mathematical, Engineering and Management Sciences, 5(6), 1429–1439.

Chen, J., Chen, J., Zhang, D., Sun, Y., & Nanehkaran, Y.A. (2020). Using deep transfer learning for image-based plant disease identification. Computers and Electronics in Agriculture, 173, 105393.

Ghosh, D., Dube, T., & Shivaprasad, A. (2010). Script recognition—a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2142–2161. Doi:10.1109/TPAMI.2010.30.

Gomez, L., & Karatzas, D. (2016). A fine-grained approach to scene text script identification. In 2016 12th IAPR Workshop on Document Analysis Systems (pp. 192–197). IEEE. Santorini, Greece.

Gomez, L., Nicolaou, A., & Karatzas, D. (2017). Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognition, 67(1), 85–96.

He, J., Feng, J., Liu, X., Cheng, T., Lin, T.H., Chung, H., & Chang, S.F. (2012). Mobile product search with bag of hash bits and boundary reranking. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 3005–3012). IEEE. Providence, Rhode Island, USA. Doi: 10.1109/CVPR.2012.6248030.

Khan, S., Islam, N., Jan, Z., Ud Din, I., & Rodrigues, J.J.P.C. (2019). A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognition Letters, 125, 1–6. Doi: 10.1016/j.patrec.2019.03.022.

Khare, V., Shivakumara, P., & Raveendran, P. (2015). A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Systems with Applications, 42(21), 7627–7640.

Li, Z., & Tang, J. (2015). Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Transactions on Image Processing, 24(12), 5343–5355. Doi: 10.1109/TIP.2015.2479560.

Li, Z., Liu, J., Tang, J., & Lu, H. (2015). Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2085–2098.

Li, Z., Tang, J., & He, X. (2017). Robust structured nonnegative matrix factorization for image representation. IEEE Transactions on Neural Networks and Learning Systems, 29(5), 1947–1960.

Lu, L., Yi, Y., Huang, F., Wang, K., & Wang, Q. (2019). Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access, 7, 52669–52679. Doi: 10.1109/ACCESS.2019.2911964.

Ma, M., Wang, Q.F., Huang, S., Huang, S., Goulermas, Y., & Huang, K. (2021). Residual attention-based multi-scale script identification in scene text images. Neurocomputing, 421, 222–233.

Mei, J., Dai, L., Shi, B., & Bai, X. (2016). Scene text script identification with convolutional recurrent neural networks. In 2016 23rd International Conference on Pattern Recognition (pp. 4053–4058). Cancun, Mexico. Doi: 10.1109/ICPR.2016.7900268.

Pant, A.K., Panday, S.P., & Joshi, S.R. (2012, November). Off-line Nepali handwritten character recognition using multilayer perceptron and radial basis function neural networks. In 2012 Third Asian Himalayas International Conference on Internet (pp. 1-5). IEEE. Kathmundu, Nepal.

Pramanik, R., & Bag, S. (2020). Segmentation-based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning. IET Image Processing, 14(5), 959–972. DOI:10.1049/iet-ipr.2019.0208.

Sharma, N., Chanda, S., Pal, U., & Blumenstein, M. (2013). Word-wise script identification from video frames. In 2013 12th International Conference on Document Analysis and Recognition (pp. 867–871). IEEE. Washington, DC, USA. Doi: 10.1109/ICDAR.2013.177.

Sharma, N., Mandal, R., Sharma, R., Pal, U., & Blumenstein, M. (2015). ICDAR2015 competition on video script identification (CVSI 2015). In 2015 13th International Conference on Document Analysis and Recognition (pp. 1196–1200). IEEE. Tunis, Tunisia. Doi: 10.1109/ICDAR.2015.7333950.

Sharma, N., Pal, U., & Blumenstein, M. (2014). A study on word-level multi-script identification from video frames. In 2014 International Joint Conference on Neural Networks (pp. 1827–1833). IEEE. Beijing, China. Doi: 10.1109/IJCNN.2014.6889906.

Shi, B., Bai, X., & Yao, C. (2016). Script identification in the wild via discriminative convolutional neural network. Pattern Recognition, 52, 448–458. Doi: 10.1016/j.patcog.2015.11.005.

Shi, B., Yao, C., Zhang, C., Guo, X., Huang, F., & Bai, X. (2015). Automatic script identification in the wild. In 2015 13th International Conference on Document Analysis and Recognition (pp. 531–535). IEEE. Tunis, Tunisia. Doi: 10.1109/ICDAR.2015.7333818.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556.

Tounsi, M., Moalla, I., Lebourgeois, F., & Alimi, A.M. (2017). CNN based transfer learning for scene script identification. In International Conference on Neural Information Processing (pp. 702–711), Springer, Cham. Guangzhou, China. https://doi.org/10.1007/978-3-319-70136-3_74.

Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., & Yibulayin, T. (2017). Script identification of multi-script documents: a survey. IEEE Access, 5, 6546–6559. Doi: 10.1109/ACCESS.2017.2689159.

Wang, T., Chen, Y., Zhang, M., Chen, J., & Snoussi, H. (2017). Internal transfer learning for improving performance in human action recognition for small datasets. IEEE Access, 5, 17627–17633.

Yang, Z., Yu, W., Liang, P., Guo, H., Xia, L., Zhang, F., Ma, Y., & Ma, J. (2019). Deep transfer learning for military object recognition under small training set condition. Neural Computing and Applications, 31(10), 6469–6478. Doi: 10.1007/s00521-018-3468-3.

Yuan, Z., Wang, H., Wang, L., Lu, T., Palaiahnakote, S., & Tan, C.L. (2016). Modeling spatial layout for scene image understanding via a novel multiscale sum-product network. Expert Systems with Applications, 63, 231–240. Doi: 10.1016/j.eswa.2016.07.015.