Page 79 - RASAS 2025
P. 79

10  Ruhuna Arts Student’s Annual Sessions (RASAS) -2025
                                       th



                 Automated Classification of Classroom Engagement Levels via Deep Learning
                                                   and Pose Estimation

                                                                     2
                                                    1
                                R.D.M.P. Kumara* , M.I.F. Sasniya , and W.D.D. Alahakoon       3
                                    Department of Information Technology, University of Ruhuna
                                                                             1
                                                  Madushanprabod129@gmail.com

               ABSTRACT

               Academic participation plays a vital role in academic performance. The conventional methods of observation

               are  subjective,  inaccurate,  and  impractical  in  large  classrooms  and  for  real-time  observation.  The  study
               presents a Deep Learning (DL) system for classifying student engagement into three subsets - high, medium,

               and low - using the “EduAction” dataset. It comprises about 718 annotated classroom photograph sequences
               across seven categories of actions (drinking, listening to a lecture, using a phone, sleeping, talking, watching
               a computer, and writing). The engagement levels were mapped onto these actions systematically, thus offering

               a structured labeling strategy. The preprocessing pipeline involved the extraction of photo frames, privacy-
               enhancing face blurring, pose estimation, and data augmentation to enhance resilience to lighting changes,

               occlusions,  and  seating  configurations.  Convolutional  Neural  Networks  (CNNs)  (e.g.,  ResNet18)  and  the
               Vision Transformer (ViT) architecture (ViT-B/16) were trained and compared, and hybrid fusion models were
               also addressed. Cross-validation between classroom sessions provided strength, while the privacy-sensitive

               practices  were  concerned  with  a  high  level  of  ethicality  by  avoiding  reliance  on  familiar  faces.  The
               experimental results indicated that the ViT was better than CNN baselines, achieving an overall accuracy of
               87%, and the precision, recall, and F1-scores were all above 85%. Pose-only and hybrid models were also used

               to show how privacy-preserving engagement recognition was possible. Grad-CAM visualizations established
               that the models learned to attend to meaningful posture and gesture cues as opposed to sensitive identity
               features.  The  analysis  of  temporal  consistency  and  real-time  inference  experiments  demonstrated  the

               applicability  of  the  system  in  live  classrooms.  These  findings  confirm  that  DL  models  trained  on  the
               “EduAction” dataset can provide accurate solutions to classroom engagement monitoring. The research fits

               within AI-driven educational analytics, providing educators with information-driven guidance on teaching
               approaches, improving the learning process, and maximizing learning outcomes.

               Keywords:  Academic  Engagement  Classification,  Deep  Learning  (DL),  Educational  Analytics,  Image
               Classification, Vision Transformer







                                                              51
   74   75   76   77   78   79   80   81   82   83   84