Data Infrastructure Application in Education: An Integrated Architecture for Secure Learning Analytics and Student Performance Prediction

Authors

  • Dinesh Pranav Mukerjea Dhaka International University

DOI:

https://doi.org/10.58776/ijitcsa.v4i1.245

Keywords:

Data Warehouse, Database, Data Analytics

Abstract

Data infrastructure has become a strategic backbone of contemporary education because digital learning environments continuously generate student traces that can be transformed into actionable evidence for teaching, advising, and institutional planning. Yet the practical value of educational data depends on much more than storage capacity. Institutions must integrate heterogeneous sources, manage raw and curated data simultaneously, enforce privacy constraints, and deliver analytics outputs that are operationally useful and ethically defensible. This study develops a layered educational data infrastructure architecture that connects raw learning data, extract-transform-load processes, governance mechanisms, curated analytics repositories, and machine-learning services. This paper includes a reproducible empirical evaluation using the real xAPI-Edu-Data benchmark collected from the Kalboard 360 learning management environment. Three machine-learning models are compared under a common preprocessing pipeline, and an ablation analysis quantifies the incremental value of integrated behavioral, parental, and contextual features. The best-performing model achieves a test macro-F1 of 0.797 and a macro one-vs-rest ROC-AUC of 0.919, while the ablation study shows that the full integrated feature set clearly outperforms demographic-only and behavior-only alternatives. The paper contributes structured architecture, mathematical formalization of integrated learning analytics, and empirical evidence that richer, better-governed data pipelines produce more useful predictive signals for educational decision support.

References

. G. Siemens, Learning analytics: The emergence of a discipline, American Behavioral Scientist 57 (10) (2013) 1380–1400. doi:10.1177/ 0002764213498851.

. M. A. Chatti, A. L. Dyckhoff, U. Schroeder, H. Thüs, A reference model for learning analytics, International Journal of Technology Enhanced Learning 4 (5-6) (2012) 318–331. doi:10.1504/IJTEL.2012.051815.

. R. Ferguson, Learning analytics: Drivers, developments and challenges, International Journal of Technology Enhanced Learning 4 (5-6) (2012) 304–317. doi:10.1504/IJTEL.2012.051816.

. L. Marquez-Vera, et al., Adoption of learning analytics in higher education institutions: A systematic literature review, British Journal of Educational Technology (2024). doi:10.1111/bjet.13385.

. K. Verbert, E. Duval, J. Klerkx, S. Govaerts, J. L. Santos, Learning analytics dashboard applications, American Behavioral Scientist 57 (10) (2013) 1500–1509. doi:10.1177/0002764213479363.

. A. Al-Fraihat, M. Joy, R. Masa’deh, J. Sinclair, Evaluating e-learning systems success: An empirical study, Computers in Human Behavior 102 (2020) 67–86. doi:10.1016/j.chb.2019.08.004.

. T. Basilaia, D. Kvavadze, Transition to online education in schools during a SARS-CoV-2 coronavirus pandemic in georgia, Pedagogical Research 5 (4) (2020). doi:10.29333/pr/7937.

. P. N. Sawadogo, J. Darmont, On data lake architectures and metadata management, Journal of Intelligent Information Systems 56 (2021) 97– 120. doi:10.1007/s10844-020-00608-7.

. S. Azzabi, Z. Alfughi, A. Ouda, Data lakes: A survey of concepts and architectures, Computers 13 (7) (2024) 183. doi:10.3390/ computers13070183.

. D. Boukraâ, M. Bala, S. Rizzi, Metadata management in data lake environments: A survey, Journal of Library Metadata (2024). doi: 10.1080/19386389.2024.2359310.

. A. Halevy, A. Rajaraman, J. J. Ordille, Data integration: The teenage years, Proceedings of the VLDB Endowment 2 (2) (2009) 9–16. doi: 10.14778/1687553.1687555.

. J. Noverlita and H. Surbakti, Streamlining stock price analysis: Hadoop ecosystem for Machine Learning Models and big data analytics, International Journal of Information Technology and Computer Science, vol. 15, no. 5, pp. 25–34, Oct. 2023. doi:10.5815/ijitcs.2023.05.03.

. M. Lenzerini, Data integration: A theoretical perspective, in: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2002, pp. 233–246. doi: 10.1145/543613.543644.

. S. Slade, P. Prinsloo, Learning analytics: Ethical issues and dilemmas, American Behavioral Scientist 57 (10) (2013) 1510–1529. doi:10.1177/ 0002764213479366.

. A. Pardo, G. Siemens, Ethical and privacy principles for learning analytics, British Journal of Educational Technology 45 (3) (2014) 438–450. doi:10.1111/bjet.12152.

. T. Hoel, W. Chen, Privacy and data protection in learning analytics should be a feature, not a bug, Research and Practice in Technology Enhanced Learning 13 (2018) 25. doi:10.1186/s41039-018-0086-8.

. C. Lawson, C. Beer, D. Rossi, T. Moore, J. Fleming, Identification of ‘at risk’ students using learning analytics: The ethical dilemmas of intervention strategies in a higher education institution, Educational Technology Research and Development 64 (5) (2016) 957–968. doi:10.1007/s11423-016-9459-0.

. W. Weng, Exploring the ethical topic of learning analytics, Educational Technology Research and Development 69 (2021) 339–341. doi:10. 1007/s11423-020-09873-3.

. P. Yang, N. Xiong, J. Ren, Data security and privacy protection for cloud storage: A survey, IEEE Access 8 (2020) 131723–131740. doi: 10.1109/ACCESS.2020.3009876.

. E. A. Amrieh, T. Hamtini, I. Aljarah, Preprocessing and analyzing educational data set using X-API for improving student’s performance, in: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, 2015, pp. 1–5. doi:10.1109/AEECT.2015. 7360581.

. E. A. Amrieh, T. Hamtini, I. Aljarah, Mining educational data to predict student’s academic performance using ensemble methods, International Journal of Database Theory and Application 9 (8) (2016) 119–136. doi: 10.14257/ijdta.2016.9.8.13.

. M. H. de Menéndez, R. Morales-Menendez, H. E. Díaz, J. C. Arámburo-Lizárraga, Learning analytics: State of the art, Journal of Computing in Higher Education 34 (2022) 547–565. doi:10.1007/ s12528-022-00930-0.

. E. T. Khor, N. F. M. Noor, S. M. Yusof, A systematic review of the role of learning analytics in personalized learning, Education Sciences 14 (1) (2024) 51. doi:10.3390/educsci14010051.

. N. A. Johar, et al., Learning analytics on student engagement to enhance learning performance: A systematic review, Sustainability 15 (10) (2023) 7849. doi:10.3390/su15107849.

. D. Hooshyar, et al., Learning analytics in supporting student agency: A systematic review, Sustainability 15 (18) (2023) 13662. doi:10.3390/ su151813662.

. W. Xiao, P. Ji, J. Hu, A survey on educational data mining methods used for predicting students’ performance, Engineering Reports 4 (5) (2022). doi:10.1002/eng2.12482.

. W. Xiao, P. Ji, J. Hu, A state-of-the-art survey of predicting students’ performance using artificial neural networks, Engineering Reports (2023). doi:10.1002/eng2.12652.

. R. Alamri, B. Alharbi, Explainable student performance prediction models: A systematic review, IEEE Access 9 (2021) 33132–33143. doi:10.1109/ACCESS.2021.3061368.

. J. Kuzilek, M. Hlosta, Z. Zdrahal, Open university learning analytics dataset, Scientific Data 4 (2017) 170171. doi:10.1038/sdata.2017. 171.

. J. Kuzilek, M. Hlosta, Z. Zdrahal, Open university learning analytics dataset, UCI Machine Learning Repository (2015). doi:10.24432/ C5KK69.

Downloads

Published

29-03-2026

How to Cite

Mukerjea, D. P. (2026). Data Infrastructure Application in Education: An Integrated Architecture for Secure Learning Analytics and Student Performance Prediction. International Journal of Information Technology and Computer Science Applications, 4(1), 21–34. https://doi.org/10.58776/ijitcsa.v4i1.245

Issue

Section

New Submission