Revisiting the IBM Retail Data Warehouse: A Governed One-Column Architecture and Reproducible Open-Dataset Validation for Retail Analytics
DOI:
https://doi.org/10.58776/ijitcsa.v4i1.247Keywords:
Retail data warehouse, Dimensional modeling , Data governance , Business intelligence, Retail analytics, Demand forecastingAbstract
The IBM Retail Data Warehouse (RDW) correctly recognized the importance of integrated retail data, but it remained largely descriptive, did not formalize the underlying architecture, and lacked a reproducible empirical validation. This paper reconstructs and substantially extends that early proposal into a publication-ready research article. We first synthesize the historical IBM RDW, Retail Data Warehouse Model (RDWM), Retail Services Data Model (RSDM), and Retail Business Solution Template (RBST) concepts with contemporary data warehousing, data governance, and retail analytics literature. We then propose a governed, RDW-informed logical architecture that separates ingestion, quality control, conformed dimensional modeling, analytics marts, and decision-support services. To move beyond conceptual discussion, we instantiate the architecture with an open retail dataset from the UCI Machine Learning Repository containing 541,909 transactions. After governance-oriented preprocessing, the final analytical mart contains 392,692 valid rows, 18,532 orders, 4,338 customers, 3,665 products, and 37 countries. We formulate the transformation and forecasting workflow mathematically, define an end-to-end algorithmic pipeline, and evaluate a retail revenue forecasting task using naive, seasonal naive, linear regression, ridge regression, random forest, and gradient boosting baselines. On the hold-out test window, the best model (linear regression on warehouse-engineered features) achieves an RMSE of 4,302.61 GBP and R2=0.9766, while a raw, ungoverned pipeline yields a much weaker RMSE of 10,068.59 GBP. This corresponds to a 57.27% reduction in RMSE attributable to governance and dimensional integration. The results show that the practical value of an RDW-like architecture is not merely organizational; when implemented as a governed analytical platform, it measurably improves reproducibility, interpretability, and forecasting quality.
References
. IBM, Industry Models for Retail: The IBM Retail Data Warehouse—Harnessing the Power of Information [Brochure]. Somers, NY, USA: IBM Software Group, 2007. Available: https://public.dhe.ibm.com/software/data/sw-library/industry-models/brochures/IBM_Retail_Models.pdf
. IBM, Retail Data Warehouse (RDW): General Information Manual. IBM, 2009. Available: https://public.dhe.ibm.com/software/data/sw-library/industry-models/brochures/IBM_retail_data_warehouse_GIMv8.pdf
. W. H. Inmon, Building the Data Warehouse, 4th ed. Indianapolis, IN, USA: Wiley, 2005. Available: https://www.wiley.com/en-es/Building%2Bthe%2BData%2BWarehouse%2C%2B4th%2BEdition-p-9780764599446
. R. Kimball and M. Ross, The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd ed. Indianapolis, IN, USA: Wiley, 2013. Available: https://www.wiley.com/en-jp/The%2BData%2BWarehouse%2BToolkit%3A%2BThe%2BDefinitive%2BGuide%2Bto%2BDimensional%2BModeling%2C%2B3rd%2BEdition-p-9781118530801
. S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” ACM SIGMOD Record, vol. 26, no. 1, pp. 65–74, 1997, doi: 10.1145/248603.248616.
. M. Golfarelli and S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies. New York, NY, USA: McGraw-Hill, 2009. Available: https://www.mheducation.com/highered/mhp/product/data-warehouse-design-modern-principles-methodologies.html
. R. Y. Wang and D. M. Strong, “Beyond accuracy: What data quality means to data consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5–33, 1996, doi: 10.1080/07421222.1996.11518099.
. V. Khatri and C. V. Brown, “Designing data governance,” Communications of the ACM, vol. 53, no. 1, pp. 148–152, 2010, doi: 10.1145/1629175.1629210.
. B. Otto, “Organizing data governance: Findings from the telecommunications industry and consequences for large service providers,” Communications of the Association for Information Systems, vol. 29, Art. 3, 2011, doi: 10.17705/1CAIS.02903.
. H. Chen, R. H. L. Chiang, and V. C. Storey, “Business intelligence and analytics: From big data to big impact,” MIS Quarterly, vol. 36, no. 4, pp. 1165–1188, 2012, doi: 10.2307/41703503.
. S. Akter, S. F. Wamba, A. Gunasekaran, R. Dubey, and S. J. Childe, “How to improve firm performance using big data analytics capability and business strategy alignment?,” International Journal of Production Economics, vol. 182, pp. 113–131, 2016, doi: 10.1016/j.ijpe.2016.08.018.
. S. F. Wamba, A. Gunasekaran, S. Akter, S. J.-f. Ren, R. Dubey, and S. J. Childe, “Big data analytics and firm performance: Effects of dynamic capabilities,” Journal of Business Research, vol. 70, pp. 356–365, 2017, doi: 10.1016/j.jbusres.2016.08.009.
. P. Mikalef, M. Boura, G. Lekakos, and J. Krogstie, “Big data analytics and firm performance: Findings from a mixed-method approach,” Journal of Business Research, vol. 98, pp. 261–276, 2019, doi: 10.1016/j.jbusres.2019.01.044.
. A. Nambiar and D. Mundra, “An overview of data warehouse and data lake in modern enterprise data management,” Big Data and Cognitive Computing, vol. 6, no. 4, Art. 132, 2022, doi: 10.3390/bdcc6040132.
. S. Bimonte, E. Gallinucci, P. Marcel, and S. Rizzi, “Data variety, come as you are in multi-model data warehouses,” Information Systems, vol. 104, Art. 101734, 2022, doi: 10.1016/j.is.2021.101734.
. A. Cuzzocrea, I.-Y. Song, and K. C. Davis, “Analytics over large-scale multidimensional data: The big data revolution!,” in Proc. 14th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP ’11), 2011, pp. 101–104, doi: 10.1145/2064676.2064695.
. N. Elgendy and A. Elragal, “Big data analytics in support of the decision making process,” Procedia Computer Science, vol. 100, pp. 1071–1084, 2016, doi: 10.1016/j.procs.2016.09.251.
. D. Chen, Online Retail [Dataset]. UCI Machine Learning Repository, 2015, doi: 10.24432/C5BW33. Available: https://archive.ics.uci.edu/dataset/352/online%2Bretail
. D. Chen, S. L. Sain, and K. Guo, “Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining,” Journal of Database Marketing & Customer Strategy Management, vol. 19, pp. 197–208, 2012, doi: 10.1057/dbm.2012.17.
. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
. J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.
. R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 3rd ed. Melbourne, Australia: OTexts, 2021. Available: https://otexts.com/fpp3/
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nayananda Karunaratne, Pulasthi Medhananda

This work is licensed under a Creative Commons Attribution 4.0 International License.
Attribution 4.0 International
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.


