Revisiting the IBM Retail Data Warehouse: A Governed One-Column Architecture and Reproducible Open-Dataset Validation for Retail Analytics

Authors

  • Nayananda Karunaratne Sabaragamuwa University of Sri Lanka
  • Pulasthi Medhananda Sabaragamuwa University of Sri Lanka

DOI:

https://doi.org/10.58776/ijitcsa.v4i1.247

Keywords:

Retail data warehouse, Dimensional modeling , Data governance , Business intelligence, Retail analytics, Demand forecasting

Abstract

The IBM Retail Data Warehouse (RDW) correctly recognized the importance of integrated retail data, but it remained largely descriptive, did not formalize the underlying architecture, and lacked a reproducible empirical validation. This paper reconstructs and substantially extends that early proposal into a publication-ready research article. We first synthesize the historical IBM RDW, Retail Data Warehouse Model (RDWM), Retail Services Data Model (RSDM), and Retail Business Solution Template (RBST) concepts with contemporary data warehousing, data governance, and retail analytics literature. We then propose a governed, RDW-informed logical architecture that separates ingestion, quality control, conformed dimensional modeling, analytics marts, and decision-support services. To move beyond conceptual discussion, we instantiate the architecture with an open retail dataset from the UCI Machine Learning Repository containing 541,909 transactions. After governance-oriented preprocessing, the final analytical mart contains 392,692 valid rows, 18,532 orders, 4,338 customers, 3,665 products, and 37 countries. We formulate the transformation and forecasting workflow mathematically, define an end-to-end algorithmic pipeline, and evaluate a retail revenue forecasting task using naive, seasonal naive, linear regression, ridge regression, random forest, and gradient boosting baselines. On the hold-out test window, the best model (linear regression on warehouse-engineered features) achieves an RMSE of 4,302.61 GBP and R2=0.9766, while a raw, ungoverned pipeline yields a much weaker RMSE of 10,068.59 GBP. This corresponds to a 57.27% reduction in RMSE attributable to governance and dimensional integration. The results show that the practical value of an RDW-like architecture is not merely organizational; when implemented as a governed analytical platform, it measurably improves reproducibility, interpretability, and forecasting quality.

References

. IBM, Industry Models for Retail: The IBM Retail Data Warehouse—Harnessing the Power of Information [Brochure]. Somers, NY, USA: IBM Software Group, 2007. Available: https://public.dhe.ibm.com/software/data/sw-library/industry-models/brochures/IBM_Retail_Models.pdf

. IBM, Retail Data Warehouse (RDW): General Information Manual. IBM, 2009. Available: https://public.dhe.ibm.com/software/data/sw-library/industry-models/brochures/IBM_retail_data_warehouse_GIMv8.pdf

. W. H. Inmon, Building the Data Warehouse, 4th ed. Indianapolis, IN, USA: Wiley, 2005. Available: https://www.wiley.com/en-es/Building%2Bthe%2BData%2BWarehouse%2C%2B4th%2BEdition-p-9780764599446

. R. Kimball and M. Ross, The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd ed. Indianapolis, IN, USA: Wiley, 2013. Available: https://www.wiley.com/en-jp/The%2BData%2BWarehouse%2BToolkit%3A%2BThe%2BDefinitive%2BGuide%2Bto%2BDimensional%2BModeling%2C%2B3rd%2BEdition-p-9781118530801

. S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” ACM SIGMOD Record, vol. 26, no. 1, pp. 65–74, 1997, doi: 10.1145/248603.248616.

. M. Golfarelli and S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies. New York, NY, USA: McGraw-Hill, 2009. Available: https://www.mheducation.com/highered/mhp/product/data-warehouse-design-modern-principles-methodologies.html

. R. Y. Wang and D. M. Strong, “Beyond accuracy: What data quality means to data consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5–33, 1996, doi: 10.1080/07421222.1996.11518099.

. V. Khatri and C. V. Brown, “Designing data governance,” Communications of the ACM, vol. 53, no. 1, pp. 148–152, 2010, doi: 10.1145/1629175.1629210.

. B. Otto, “Organizing data governance: Findings from the telecommunications industry and consequences for large service providers,” Communications of the Association for Information Systems, vol. 29, Art. 3, 2011, doi: 10.17705/1CAIS.02903.

. H. Chen, R. H. L. Chiang, and V. C. Storey, “Business intelligence and analytics: From big data to big impact,” MIS Quarterly, vol. 36, no. 4, pp. 1165–1188, 2012, doi: 10.2307/41703503.

. S. Akter, S. F. Wamba, A. Gunasekaran, R. Dubey, and S. J. Childe, “How to improve firm performance using big data analytics capability and business strategy alignment?,” International Journal of Production Economics, vol. 182, pp. 113–131, 2016, doi: 10.1016/j.ijpe.2016.08.018.

. S. F. Wamba, A. Gunasekaran, S. Akter, S. J.-f. Ren, R. Dubey, and S. J. Childe, “Big data analytics and firm performance: Effects of dynamic capabilities,” Journal of Business Research, vol. 70, pp. 356–365, 2017, doi: 10.1016/j.jbusres.2016.08.009.

. P. Mikalef, M. Boura, G. Lekakos, and J. Krogstie, “Big data analytics and firm performance: Findings from a mixed-method approach,” Journal of Business Research, vol. 98, pp. 261–276, 2019, doi: 10.1016/j.jbusres.2019.01.044.

. A. Nambiar and D. Mundra, “An overview of data warehouse and data lake in modern enterprise data management,” Big Data and Cognitive Computing, vol. 6, no. 4, Art. 132, 2022, doi: 10.3390/bdcc6040132.

. S. Bimonte, E. Gallinucci, P. Marcel, and S. Rizzi, “Data variety, come as you are in multi-model data warehouses,” Information Systems, vol. 104, Art. 101734, 2022, doi: 10.1016/j.is.2021.101734.

. A. Cuzzocrea, I.-Y. Song, and K. C. Davis, “Analytics over large-scale multidimensional data: The big data revolution!,” in Proc. 14th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP ’11), 2011, pp. 101–104, doi: 10.1145/2064676.2064695.

. N. Elgendy and A. Elragal, “Big data analytics in support of the decision making process,” Procedia Computer Science, vol. 100, pp. 1071–1084, 2016, doi: 10.1016/j.procs.2016.09.251.

. D. Chen, Online Retail [Dataset]. UCI Machine Learning Repository, 2015, doi: 10.24432/C5BW33. Available: https://archive.ics.uci.edu/dataset/352/online%2Bretail

. D. Chen, S. L. Sain, and K. Guo, “Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining,” Journal of Database Marketing & Customer Strategy Management, vol. 19, pp. 197–208, 2012, doi: 10.1057/dbm.2012.17.

. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

. J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.

. R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 3rd ed. Melbourne, Australia: OTexts, 2021. Available: https://otexts.com/fpp3/

Downloads

Published

13-04-2026

How to Cite

Karunaratne, N., & Medhananda, P. (2026). Revisiting the IBM Retail Data Warehouse: A Governed One-Column Architecture and Reproducible Open-Dataset Validation for Retail Analytics. International Journal of Information Technology and Computer Science Applications, 4(1), 59–69. https://doi.org/10.58776/ijitcsa.v4i1.247

Issue

Section

New Submission