Early prediction of children at risk for literacy difficulties: An Explainable AI approach
Abstract
Early identification of children at risk for literacy difficulties is critical for enabling timely interventions and improving long-term educational outcomes. Current methods for predicting emergent literacy competences in early childhood remain limited in accuracy and often neglect the complex interplay of developmental and contextual factors. This study adopts a theory-driven, explainable machine learning approach to model kindergarten literacy outcomes using data from infancy and toddlerhood. Drawing on a longitudinal dataset of 203 Greek-speaking children aged 6–36 months, we trained multiple Machine Learning classifiers on features capturing early communication and language skills and several contextual factors. The Extra Trees classifier yielded the highest predictive performance, with an F1 score of .72, and explainable AI techniques (SHAP) revealed theoretical insights into the factors underlying early literacy competences. Early communication and receptive language skills along with parental time involvement emerged as important protective factors against literacy difficulties, while advanced lexical and morphosyntactic skills as key predictors of advanced literacy outcomes. Interestingly, contextual indicators such as parental education or home learning environment did not show an added predictive value. These findings demonstrate the feasibility of combining early developmental markers and contextual factors with ML to anticipate literacy trajectories, offering a promising avenue for the application of proactive intervention strategies in early childhood education. They also illustrate the utility of Explainable AI to obtain theoretical insights on the complex interactions that shape early literacy development
Related articles
Related articles are currently not available for this article.