Jon Chun, Katherine Elkins (Kenyon College)
January 30, 2026
arXiv | PDF
The paper investigates whether emotional framing, the kind of persuasive, sympathetic narratives that reliably bias human decision-makers, can also sway LLMs when they’re applied to rule-bound institutional decisions like grade appeals, loan underwriting, and emergency triage. The surprising answer is no: across 12,113 responses from six different models, emotional narratives produced essentially zero decision drift (Cohen’s h = 0.003), while the same types of framing effects cause substantial bias in humans (Cohen’s h = 0.3–0.8).
The “paradox” is that LLMs are known to be lexically brittle (sensitive to how a prompt is formatted) and prone to sycophancy, yet they are rationally stable when it comes to rule-based decisions. They resist emotional manipulation 110–300x better than humans. This decoupling between surface-level prompt sensitivity and deep logical consistency is counterintuitive and has significant implications for deploying AI in high-stakes institutional settings.
Cohen’s h (Effect Size) is a statistical measure of the difference between two proportions. Values near 0 mean no practical difference; 0.2 is “small,” 0.5 is “medium,” 0.8 is “large.” The paper uses Cohen’s h to compare decision rates between emotional and neutral conditions. The LLM value of 0.003 is essentially zero.
Bayes Factor (BF₀₁) is a Bayesian statistic that quantifies evidence for the null hypothesis (no effect) vs. the alternative (some effect). BF₀₁ = 109 means the data is 109 times more likely under “no effect” than under “some effect”. Conventionally, anything above 100 is “extreme evidence.”
Framing Effects is a well-documented cognitive bias where the way information is presented (e.g., sympathetic backstory, emotional language) changes human decisions even when the underlying facts are identical. This is a core concern in behavioral economics and legal decision-making.
RLHF (Reinforcement Learning from Human Feedback) is the dominant fine-tuning method for instruction-following LLMs. Human raters rank model outputs, and the model is trained to prefer higher-ranked responses. Used by GPT, Llama, and Mistral families.
Constitutional AI is Anthropic’s training approach (used for Claude) where the model self-critiques against a set of principles rather than relying solely on human raters. The paper tests whether this different alignment approach produces different robustness characteristics (it doesn’t).
Decision Drift is the change in a model’s decision rate when exposed to emotional framing vs. a neutral control. A drift of 0% means the model’s decisions are identical regardless of framing.
Instruction Ablation is an experimental technique where instructions are systematically removed to test what drives a behavior. Here, removing “ignore the narrative” instructions showed that robustness isn’t dependent on explicit guardrails.