Synthetic Data Generation for Bias Mitigation in AI: A Literature Review on Generative AI and Knowledge-Driven Methods

Savannah Shannon
Mahfuzur Rahman
Roy George
Kishor Datta Gupta

0 evaluations Published on Aug 26, 2025

This article on Sciety

Abstract

AI systems may reproduce and amplify societal biases present in their training data as decision-making becomes more automated. The acknowledged biases present significant barriers to equity, accountability, and the ethical application of AI. This research review evaluates the efficacy of synthetic data generation through generative AI and knowledge-based methodologies in alleviating dataset bias and enhancing equality in AI systems. This study analyzes recent advancements in fairness-aware generative modeling, including text-to-image fairness algorithms like Fair Diffusion and FairCoT, knowledge-driven approaches such as DECAF and counterfactual GANs, as well as comprehensive frameworks like FairGAN and FairGen. The paper examines theoretical frameworks and empirical evaluations in graphical and tabular formats. The production of synthetic data can improve demographic representation and guarantee that results align with defined fairness standards. However, drawbacks still exist regarding the quality of annotations, scalability, equity trade-offs, and ethical considerations. So in this paper, we outline the potential research directions, including multimodal fairness frameworks, interactive refinement with human feedback, and fairness pretraining for foundational models. This analysis of ours indicates that not only are these approaches effective, but also they can be applied in various contexts. However, the success of these approaches relies on thorough implementation and continuous monitoring with a strong allegiance to ethical AI principles.

Related articles are currently not available for this article.