About Me | Articles | Recommended AI Books

Unintended Memorization of Sensitive Information in Fine-Tuned Language Models

Written by

Marton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert (Technical University of Munich, Imperial College London)
2601.17480v1
January 24, 2026

The paper investigates a critical privacy vulnerability: LLMs can memorize and leak personally identifiable information (PII) that appears only in training inputs, not in the training targets. Even when PII is irrelevant to the downstream task, fine-tuned models can be tricked into revealing names, addresses, and other sensitive data.

How to Build a Local Legal RAG Pipeline with ChromaDB and Voyage AI

March 29, 2026
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

March 15, 2026
LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

March 8, 2026
Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs

February 22, 2026

Unintended Memorization of Sensitive Information in Fine-Tuned Language Models

More posts

How to Build a Local Legal RAG Pipeline with ChromaDB and Voyage AI

Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs