LLM-Based Persona-Driven Text Data Augmentation

Hyeon Seong Jeong
Han Kyeong Ko
Taehoon Kim

0 evaluations Published on Apr 23, 2025

This article on Sciety

Abstract

The rise of drug‑related crime in South Korea, especially via online messengers, reveals clear limits in keyword‑based or network‑tracking detection methods. To address this in a low‑resource setting, we propose a large‑language‑model (LLM) persona‑driven data‑augmentation framework. Buyer and seller personas replicate authentic linguistic patterns, slang and delivery practices, generating realistic, context‑rich dialogue. Using text‑embedding similarity, type–token ratio (TTR), perplexity, dialogue coherence and ROUGE, we show that 15 000 augmented dialogues closely mirror 87 real conversations while boosting lexical variety and contextual consistency. Results confirm that persona‑driven augmentation mitigates data scarcity and improves illicit‑dialogue detectors, offering a transferable strategy for other sensitive, low‑data domains such as voice phishing or fraudulent trade.

Related articles are currently not available for this article.