Automated Estimation of Difficulty Levels in Math Word Problems using Linguistic, Mathematical, and Semantic Features

Shilpa  Kadam; Shoukhi  Khan; Jabez  Christopher; PTV Praveen Kumar; Dipak Kumar Satpathi

doi:https://doi.org/10.33889/IJMEMS.2026.11.2.028

Shilpa Kadam
Department of Mathematics, BITS Pilani, Hyderabad Campus, Telangana State, India.

Shoukhi Khan
IBM, Bengaluru, Karnataka, India.

Jabez Christopher
Department of Department of Computer Science and Information Systems, BITS Pilani, Hyderabad Campus, Telangana State, India.

PTV Praveen Kumar
Department of Mathematics, BITS Pilani, Hyderabad Campus, Telangana State, India.

Dipak Kumar Satpathi
Department of Mathematics, BITS Pilani, Hyderabad Campus, Telangana State, India.

DOI https://doi.org/10.33889/IJMEMS.2026.11.2.028

Received on December 04, 2025

;

Accepted on February 04, 2026

Abstract

Math Word Problems (MWPs) remain challenging for learners due to linguistic complexity, mathematical reasoning demands, and contextual variability. Accurately estimating item difficulty is essential for adaptive learning and automated assessment, yet many existing approaches rely on expert annotation or Item Response Theory (IRT), which are resource-intensive and difficult to scale to new items. This paper proposes IDEA, an integrated data-driven framework that extracts linguistic, mathematical, and semantic embedding features to predict MWP difficulty on a five-level scale. Using 4,244 algebra problems from the MATH dataset, we evaluate multiple feature sets and models, showing that embedding-based representations outperform handcrafted features; on a held-out test set, (Macro-F1 = 0.40 vs. 0.29). Since difficulty levels are ordinal, we additionally report ordinal-aware evaluation: an ordinal regression model achieves MAE = 1.08, quadratic weighted kappa = 0.37, and within-one-level accuracy of 0.71, indicating that most predictions are close even when exact matching is difficult. Model-interpretability analysis using SHAP highlights readability and sentence-structure features as dominant contributors to predicted difficulty; exploratory SEM analysis is included to examine relationships among feature groups but is interpreted cautiously due to limited global fit. Finally, external validation using seven expert ratings and IRT estimates from 61 students suggests variability in human judgment while supporting the practical utility of automated calibration. Overall, IDEA provides a scalable approach to item calibration and helps mitigate cold-start challenges in adaptive learning settings as well as contribute in fine-tuning large language models.

Keywords- Math word problems, Item difficulty, Item response theory, Adaptive learning systems, Data-driven approach, Word embeddings, Difficulty estimation, SHAP.

Citation

Kadam, S., Khan, S., Christopher, J., Kumar, P. P. & Satpathi, D. K (2026). Automated Estimation of Difficulty Levels in Math Word Problems using Linguistic, Mathematical, and Semantic Features. International Journal of Mathematical, Engineering and Management Sciences, 11(2), 679-705. https://doi.org/10.33889/IJMEMS.2026.11.2.028.

Volume 11 (2026)

Number 2 (April)

Pages 679-705

PDF

Downloads: 5

International Journal of Mathematical, Engineering and Management Sciences

eISSN: 2455-7749 . Open Access

Automated Estimation of Difficulty Levels in Math Word Problems using Linguistic, Mathematical, and Semantic Features