About Me | Articles | Recommended AI Books

Category: Safety

  • The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making

    Jon Chun, Katherine Elkins (Kenyon College)
    January 30, 2026
    arXiv | PDF

    The paper investigates whether emotional framing, the kind of persuasive, sympathetic narratives that reliably bias human decision-makers, can also sway LLMs when they’re applied to rule-bound institutional decisions like grade appeals, loan underwriting, and emergency triage. The surprising answer is no: across 12,113 responses from six different models, emotional narratives produced essentially zero decision drift (Cohen’s h = 0.003), while the same types of framing effects cause substantial bias in humans (Cohen’s h = 0.3–0.8).

    The “paradox” is that LLMs are known to be lexically brittle (sensitive to how a prompt is formatted) and prone to sycophancy, yet they are rationally stable when it comes to rule-based decisions. They resist emotional manipulation 110–300x better than humans. This decoupling between surface-level prompt sensitivity and deep logical consistency is counterintuitive and has significant implications for deploying AI in high-stakes institutional settings.

    Cohen’s h (Effect Size) is a statistical measure of the difference between two proportions. Values near 0 mean no practical difference; 0.2 is “small,” 0.5 is “medium,” 0.8 is “large.” The paper uses Cohen’s h to compare decision rates between emotional and neutral conditions. The LLM value of 0.003 is essentially zero.

    Bayes Factor (BF₀₁) is a Bayesian statistic that quantifies evidence for the null hypothesis (no effect) vs. the alternative (some effect). BF₀₁ = 109 means the data is 109 times more likely under “no effect” than under “some effect”. Conventionally, anything above 100 is “extreme evidence.”

    Framing Effects is a well-documented cognitive bias where the way information is presented (e.g., sympathetic backstory, emotional language) changes human decisions even when the underlying facts are identical. This is a core concern in behavioral economics and legal decision-making.

    RLHF (Reinforcement Learning from Human Feedback) is the dominant fine-tuning method for instruction-following LLMs. Human raters rank model outputs, and the model is trained to prefer higher-ranked responses. Used by GPT, Llama, and Mistral families.

    Constitutional AI is Anthropic’s training approach (used for Claude) where the model self-critiques against a set of principles rather than relying solely on human raters. The paper tests whether this different alignment approach produces different robustness characteristics (it doesn’t).

    Decision Drift is the change in a model’s decision rate when exposed to emotional framing vs. a neutral control. A drift of 0% means the model’s decisions are identical regardless of framing.

    Instruction Ablation is an experimental technique where instructions are systematically removed to test what drives a behavior. Here, removing “ignore the narrative” instructions showed that robustness isn’t dependent on explicit guardrails.

  • Robust Generalizable Heterogeneous Legal Link Prediction

    Lorenz Wendlinger, Simon Alexander Nonn, Abdullah Al Zubaer, Michael Granitzer
    2602.04812v1 | PDF
    February 4, 2026


    The paper improves legal citation link prediction using Graph Neural Networks (GNNs). The authors introduce R-HGE (Robust Heterogeneous Graph Enrichment), which predicts missing citations between legal cases and laws more accurately than previous methods.

    Graph Neural Networks (GNNs) are deep learning models that operate on graph-structured data (nodes + edges) by iteratively passing messages between connected nodes to learn representations. After multiple rounds of neighborhood aggregation, each node captures information from its surrounding structure, enabling tasks like node classification, link prediction, and graph-level classification.                                         

    Robust Heterogeneous Graph Enrichment extends basic GNNs to handle real-world graphs that have multiple node/edge types (heterogeneous), missing information (enrichment fills gaps with external data or inferred connections), and noise or incompleteness (robustness). It’s particularly relevant for domains like legal AI where knowledge graphs naturally contain diverse entity types, incomplete relationships, and messy data.