Clustering and Sales Prediction Using K-Means and Simple Linear Regression
DOI:
https://doi.org/10.58776/ijitcsa.v4i2.209Keywords:
Clustering, K-Means, Sales Prediction, Simple Linear Regression, Customer SegmentationAbstract
CV. Cipta Usaha Selaras faces challenges in identifying customer purchasing patterns and accurately projecting sales values. The importance of this research lies in the company’s need for data-driven marketing strategies and efficient operational planning. This study employs the K-Means algorithm to cluster customers based on purchase frequency and total transaction value, as well as Simple Linear Regression to predict total purchases based on transaction frequency. The data analyzed consists of 358 sales transaction entries from the year 2024. The clustering results reveal three customer segments with distinct characteristics, with a Silhouette Score of 0.7913, indicating good segmentation quality. The regression model produced an equation with a coefficient of determination (R²) of 0.6910, a MAE of IDR 213 million, and a MSE of IDR 206 trillion. These results indicate that the applied approach provides a reasonably representative overview of customer purchasing behavior. This research offers a significant contribution to data-driven decision-making within the company, particularly in the development of marketing strategies and estimation of potential revenue.
References
A. W. Zunan Setiawan, Muhammad Fajar , Arif Mudi Priyatno, Anggi Yhurinda Perdana Putri, Mediana Aryuni , Siti Yuliyanti, Harya Widiputra , Budanis Dwi Meilani, Rohmat Nur Ibrahim, Rezania Agramanisti Azdy, Satrio Junaidi, BUKU AJAR DATA MINING. PT. Sonpedia Publishing Indonesia, 2023. [Online]. Available: https://www.google.co.id/books/edition/BUKU_AJAR_DATA_MINING/1nLVEAAAQBAJ?hl=id&gbpv=1
I. Safira, R. Salkiawati, and W. Priatna, “Penerapan Algoritma K-Means untuk Mengetahui Pola Persediaan Barang pada Toko Raja Bekasi,” Journal of Informatic and Information Security, vol. 3, no. 1, pp. 99–110, 2022, doi: 10.31599/jiforty.v3i1.1253.
A. Nugraha, O. Nurdiawan, and G. Dwilestari, “Penerapan Data Mining Metode K-Means Clustering Untuk Analisa Penjualan Pada Toko Yana Sport,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 6, no. 2, pp. 849–855, 2022, doi: 10.36040/jati.v6i2.5755.
P. A. Duran, A. V. Vitianingsih, M. S. Riza, A. L. Maukar, and S. F. A. Wati, “Data Mining Untuk Prediksi Penjualan Menggunakan Metode Simple Linear Regression,” Teknika, vol. 13, no. 1, pp. 27–34, 2024, doi: 10.34148/teknika.v13i1.712.
M. Yasir, F. Sinlae, and C. Author, “Penerapan Algoritma K-Means dan Linear Reggression Sederhana Dalam Klasterisasi Grafik Bandwidth,” vol. 1, no. 4, pp. 150–158, 2023, [Online]. Available: https://creativecommons.org/licenses/by/4.0/
U. Arfan and N. Paraga, “Perbandingan Algoritma K-Means, Naïve Bayes dan Decision Tree Dalam Memprediksi Penjualan Bahan Bakar Minyak,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 4, pp. 1379–1389, 2024, doi: 10.57152/malcom.v4i4.1566.
B. Sutara, F. Vulture, and R. Novianti, “Application of K-Means algorithm with CRISP-DM method in student data analysis as a support for promotion strategy,” Side: Scientific Development …, vol. 1, no. 1, pp. 1–7, 2024, [Online]. Available: https://ojs.arbain.co.id/index.php/side/article/view/6%0Ahttps://ojs.arbain.co.id/index.php/side/article/download/6/6
E. Muningsih, I. Maryani, and V. R. Handayani, “Penerapan Metode K-Means dan Optimasi Jumlah Cluster dengan Index Davies Bouldin untuk Clustering Propinsi Berdasarkan Potensi Desa,” Jurnal Sains dan Manajemen, vol. 9, no. 1, p. 96, 2021, [Online]. Available: www.bps.go.id
R. Primartha, Algoritma Machine Learning. Bandung: Informatika Bandung, 2021.
G. N. Ayuni and D. Fitrianah, “Penerapan Metode Regresi Linear Untuk Prediksi Penjualan Properti pada PT XYZ,” Jurnal Telematika, vol. 14, no. 2, pp. 79–86, 2020, doi: 10.61769/telematika.v14i2.321.
F. Ramdhani and K. Setiawan, “Penerapan Data Mining untuk Prediksi Pelanggan di PT. XYZ Menggunakan Algoritma Linear Regression,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 2, pp. 490–497, 2024, doi: 10.57152/malcom.v4i2.1217.
M. Piao Tan and C. A. Floudas, “Determining the Optimal Number of Clusters,” Encyclopedia of Optimization, vol. 1, pp. 687–694, 2023, doi: 10.1007/978-0-387-74759-0_123.
A. Novalas et al., “Analisis prediksi penjualan iklan media massa dan elektronik menggunakan metode linear regression,” vol. 7, pp. 203–209, 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Tia Aulia, Wowon Priatna , Muhammad Yasir

This work is licensed under a Creative Commons Attribution 4.0 International License.
Attribution 4.0 International
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.


