OpenAI Tracks Down GPT-5.5 Goblins

2026-05-01 · openai

OpenAI published a post on why GPT-5.5 developed a weird habit of saying “goblins” and “gremlins,” especially in Codex. The write-up is a rare behind-the-scenes look at how personality tuning and reward shaping can create surprising model behavior. For anyone building with agentic models, it’s a solid reminder that style quirks can spread through training pipelines if you’re not careful.

Key Features or Updates

OpenAI says the issue came from reward signals tied to a “Nerdy” personality, where creature metaphors were accidentally over-rewarded. The team traced the behavior through production traffic, RL data, and SFT data, then removed the offending signal and filtered creature-heavy samples.

Impact on Developers

This matters if you build with Codex or any agent workflow that reuses model outputs in training. It shows how a small prompt or reward choice can spill into broader behavior, even outside the original condition.

How to use it

Treat style tuning as a system-level risk, not a cosmetic tweak. If you rely on generated rollouts for fine-tuning, audit them for repeated lexical tics, weird overfitting, and unintended personality transfer.

Read Original Post →