Centaur May Have Learned a Shortcut that Explains Away Psychological Tasks
Abstract
In a recent landmark effort, an international collaboration of cognitive scientists produced Psych-101, the largest behavioral dataset on human cognition described in natural language, comprising over 10 million human decisions across 160 psychological experiments. Building on this resource, the authors fine-tuned a pretrained large language model (LLM) to predict human choices in these experiments, called Centaur. While Centaur demonstrates impressive predictive performance—especially compared to domain-specific cognitive models—we find that much of its advantage stems from leveraging sequential dependencies in human choices. Over-reliance on such dependencies risks marginalizing the task-driven mechanisms that are also central to explaining human behavior. By reanalyzing the original Centaur model through controlled experiments that isolate task information and choice history, we find that Centaur outperforms domain-specific cognitive models even when no psychological task is provided, yet underperforms in other tasks where choice history is removed. These findings suggest that Centaur may have learned a shortcut that is insensitive to psychological tasks.
Related articles
Related articles are currently not available for this article.