Improving Counterfactual Story Rewriting with Policy-Gradient Approaches

Amelie Girard
Inigo Jauregi Unanue
Massimo Piccardi

0 evaluations Published on Jul 9, 2025

This article on Sciety

Abstract

Counterfactual story rewriting is the task of revising an existing narrative in light of an alternative event while retaining the unchanged elements of the story and its overall coherence. This task is challenging for NLP models because the changes expected in the original story are typically small and circumscribed, and conventional training objectives such as maximum likelihood fail to capture them effectively. For this reason, in this paper we propose a reinforcement learning (RL) approach to counterfactual story rewriting that explicitly rewards the desired counterfactual changes. Specifically, we propose fine-tuning a seq2seq model using policy-gradient approaches (REINFORCE with baseline and proximal policy optimization) with a reward function designed to capture both adherence to the reference edited story and semantic coherence. Experimental results on the TimeTravel dataset show that our RL-based approach has been capable of producing better rewritings compared to the conventionally-trained baseline, and outperform two contemporary large language models on this task. Overall, our findings highlight the benefit of reinforcement learning for complex, controlled text generation tasks requiring nuanced predictions.

Related articles are currently not available for this article.