Page 79 - RASAS 2025

P. 79

10 Ruhuna Arts Student’s Annual Sessions (RASAS) -2025
th

Automated Classification of Classroom Engagement Levels via Deep Learning
and Pose Estimation

2
1
R.D.M.P. Kumara* , M.I.F. Sasniya , and W.D.D. Alahakoon 3
Department of Information Technology, University of Ruhuna
1
Madushanprabod129@gmail.com

ABSTRACT

Academic participation plays a vital role in academic performance. The conventional methods of observation

are subjective, inaccurate, and impractical in large classrooms and for real-time observation. The study
presents a Deep Learning (DL) system for classifying student engagement into three subsets - high, medium,

and low - using the “EduAction” dataset. It comprises about 718 annotated classroom photograph sequences
across seven categories of actions (drinking, listening to a lecture, using a phone, sleeping, talking, watching
a computer, and writing). The engagement levels were mapped onto these actions systematically, thus offering

a structured labeling strategy. The preprocessing pipeline involved the extraction of photo frames, privacy-
enhancing face blurring, pose estimation, and data augmentation to enhance resilience to lighting changes,

occlusions, and seating configurations. Convolutional Neural Networks (CNNs) (e.g., ResNet18) and the
Vision Transformer (ViT) architecture (ViT-B/16) were trained and compared, and hybrid fusion models were
also addressed. Cross-validation between classroom sessions provided strength, while the privacy-sensitive

practices were concerned with a high level of ethicality by avoiding reliance on familiar faces. The
experimental results indicated that the ViT was better than CNN baselines, achieving an overall accuracy of
87%, and the precision, recall, and F1-scores were all above 85%. Pose-only and hybrid models were also used

to show how privacy-preserving engagement recognition was possible. Grad-CAM visualizations established
that the models learned to attend to meaningful posture and gesture cues as opposed to sensitive identity
features. The analysis of temporal consistency and real-time inference experiments demonstrated the

applicability of the system in live classrooms. These findings confirm that DL models trained on the
“EduAction” dataset can provide accurate solutions to classroom engagement monitoring. The research fits

within AI-driven educational analytics, providing educators with information-driven guidance on teaching
approaches, improving the learning process, and maximizing learning outcomes.

Keywords: Academic Engagement Classification, Deep Learning (DL), Educational Analytics, Image
Classification, Vision Transformer

74 75 76 77 78 79 80 81 82 83 84