Synthetic Data for Business Intelligence: A New Paradigm for Privacy-Preserving Machine Learning in Enterprise Environments
Abstract
The growing demand for data-driven decision-making in the enterprise context poses a conflict between the utilisation of machine learning (ML) and data privacy. The paper examines the feasibility of using synthetic data to replace actual enterprise data in business intelligence (BI) applications. Synthetic datasets were created using the CTGAN, Variational Autoencoders (VAE), and diffusion models and were successfully assessed in fraud detection and customer segmentation tasks. Empirical findings indicate that XGBoost with synthetic data as training data achieved an accuracy value of 97 percent, with an ROC AUC of 0.94, which is relatively close to the achievable accuracy with real data. CTGAN was found to have high fidelity as the Wasserstein distances were less than 0.15, and the Jensen-Shannon divergence was less than 0.08. The visualisations of dimensionality reductions ensured that the real and synthetic data had a substantial structural similarity. Privacy analyses revealed that the Nearest Neighbour Adversarial Distance (NNAD) scores differed between CTGAN and diffusion models, with values of 0.38 and 0.36, respectively. Corresponding Membership Inference Attack (MIA) success rates were 51-52%, which is significantly lower than the 68% success rate of the anonymised real data. These findings confirm the consideration that synthetic data can maintain analytical value and diminish privacy risks, providing an effective approach to the safe and scalable implementation of ML in businesses.
Keywords
References
B. Y. Almansour, A. Y. Almansour, J. I. Janjua, M. Zahid, and T. Abbas, "Application of Machine Learning and Rule Induction in Various Sectors," in 2024 International Conference on Decision Aid Sciences and Applications (DASA), 2024: IEEE, pp. 1-8, doi: https://doi.org/10.1109/DASA63652.2024.10836265.
J. Andrew and M. Baker, "The general data protection regulation in the age of surveillance capitalism," Journal of Business Ethics, vol. 168, pp. 565-578, 2021, doi: https://doi.org/10.1007/s10551-019-04239-z.
Z. Syed, O. Okegbola, and C. A. Akiotu, "Utilising Artificial Intelligence and Machine Learning for Regulatory Compliance in Financial Institutions," in Perspectives on Digital Transformation in Contemporary Business: IGI Global Scientific Publishing, 2025, pp. 269-296.
M. D. Tamang, V. K. Shukla, S. Anwar, and R. Punhani, "Improving business intelligence through machine learning algorithms," in 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM), 2021: IEEE, pp. 63-68, doi: https://doi.org/10.1109/ICIEM51511.2021.9445344.
R. Afridah, M. Ula, and L. Rosnita, "Performance Analysis Algorithm Classification and Regression Trees and Naive Bayes Based Particle Swarm Optimisation for Credit Card Transaction Fraud Detection," International Journal of Engineering, Science & Information Technology, vol. 4, no. 3, 2024, doi: https://doi.org/10.52088/ijesty.v4i3.523.
J. Krämer and D. Schnurr, "Big data and digital markets contestability: Theory of harm and data access remedies," Journal of Competition Law & Economics, vol. 18, no. 2, pp. 255-322, 2022, doi: https://doi.org/10.1093/joclec/nhab015.
G. M. Garrido, J. Sedlmeir, Ö. Uluda?, I. S. Alaoui, A. Luckow, and F. Matthes, "Revealing the landscape of privacy-enhancing technologies in the context of data markets for the IoT: A systematic literature review," Journal of Network and Computer Applications, vol. 207, p. 103465, 2022, doi: https://doi.org/10.1016/j.jnca.2022.103465.
E. M. Heyworth-Thomas, "Creating experiential learning opportunities in enterprise education: an example of a facilitator-led business simulation game in a taught setting," Journal of Work-Applied Management, vol. 15, no. 2, pp. 173-187, 2023, doi: https://doi.org/10.1108/JWAM-02-2023-0018.
M. Sabuhi, M. Zhou, C.-P. Bezemer, and P. Musilek, "Applications of generative adversarial networks in anomaly detection: A systematic literature review," Ieee Access, vol. 9, pp. 161003-161029, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3131949.
R. Sauber-Cole and T. M. Khoshgoftaar, "The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey," Journal of Big Data, vol. 9, no. 1, p. 98, 2022, doi: https://doi.org/10.1186/s40537-022-00648-6.
K. Ngcobo, S. Bhengu, A. Mudau, B. Thango, and M. Lerato, "Enterprise data management: Types, sources, and real-time applications to enhance business performance-a systematic review," Systematic Review| September, 2024, doi: 10.20944/preprints202409.1913.v1.
M. Fallahian, M. Dorodchi, and K. Kreth, "GAN-based tabular data generator for constructing synopsis in approximate query processing: Challenges and solutions," Machine Learning and Knowledge Extraction, vol. 6, no. 1, pp. 171-198, 2024, doi: https://doi.org/10.3390/make6010010.
M. Alves Gomes and T. Meisen, "A review on customer segmentation methods for personalised customer targeting in e-commerce use cases," Information Systems and e-Business Management, vol. 21, no. 3, pp. 527-570, 2023, doi: https://doi.org/10.1007/s10257-023-00640-4.
P. More and S. S. K. Pothula, "Quantum Leap in Customer Persona Development: Enhancing Consumer Profiles and Experiences Using Quantum AI," in The Quantum AI Era of Neuromarketing: IGI Global Scientific Publishing, 2025, pp. 133-156.
K. T. Chui, B. B. Gupta, P. Chaurasia, V. Arya, A. Almomani, and W. Alhalabi, "Three-stage data generation algorithm for multiclass network intrusion detection with highly imbalanced dataset," International Journal of Intelligent Networks, vol. 4, pp. 202-210, 2023, doi: https://doi.org/10.1016/j.ijin.2023.08.001.
J. Mao, W. Hu, and X. Wen, "Forecasting emerging product trends in smart supply chains," Computer and Decision Making: An International Journal, vol. 1, pp. 196-210, 2024, doi: https://doi.org/10.59543/comdem.v1i.10699.
A. J. Mohammad, "Dynamic Labor Forecasting via Real-Time Timekeeping Stream," International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 4, pp. 56-65, 2023, doi: https://doi.org/10.63282/3050-9416.IJAIBDCMS-V4I4P107.
S. Sampaio, P. R. Sousa, C. Martins, A. Ferreira, L. Antunes, and R. Cruz-Correia, "Collecting, processing and secondary using personal and (pseudo) anonymised data in smart cities," Applied Sciences, vol. 13, no. 6, p. 3830, 2023, doi: https://doi.org/10.3390/app13063830.
B. Jiang, J. Li, G. Yue, and H. Song, "Differential privacy for industrial internet of things: Opportunities, applications, and challenges," IEEE Internet of Things Journal, vol. 8, no. 13, pp. 10430-10451, 2021, doi: https://doi.org/10.1109/JIOT.2021.3057419.
Y. Long, S. Kroeger, M. F. Zaeh, and A. Brintrup, "Leveraging synthetic data to tackle machine learning challenges in supply chains: challenges, methods, applications, and research opportunities," International Journal of Production Research, pp. 1-22, 2025, doi: https://doi.org/10.1080/00207543.2024.2447927.
J. R. Machireddy, "Data quality management and performance optimisation for enterprise-scale etl pipelines in modern analytical ecosystems," Journal of Data Science, Predictive Analytics, and Big Data Applications, vol. 8, no. 7, pp. 1-26, 2023. [Online]. Available: https://helexscience.com/index.php/JDSPABDA/article/view/2023-07-04.
A. T. Trad, "Enterprise Transformation Projects/Cloud Transformation Concept: The Compute System (CTC-CS)," in Handbook of Research on Advancements in AI and IoT Convergence Technologies: IGI Global, 2023, pp. 145-177.
Ç. S?cakyüz, S. A. Edalatpanah, and D. Pamucar, "Data mining applications in risk research: A systematic literature review," International Journal of Knowledge-Based and Intelligent Engineering Systems, vol. 29, no. 2, pp. 222-261, 2025, doi: https://doi.org/10.1177/13272314241296866.
S. K. Vishwakarma, "AI-Driven Predictive Risk Modelling for Aerospace Supply Chains," International Interdisciplinary Business Economics Advancement Journal, vol. 6, no. 05, pp. 102-134, 2025, doi: https://doi.org/10.55640/business/volume06issue05-06.
DOI: https://doi.org/10.52088/ijesty.v5i4.1442
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Deep Barot, Kamal Mohammed Najeeb Shaik, Mohammad Mushfiqul Haque Mukit, Vinesh Melath, Rithesh Nair




























