Indirect prompt injection: when your AI reads the wrong instructions

June 13, 2026

Give an AI coding agent the ability to fetch a URL and you have also given it the ability to read instructions written by a stranger. Indirect prompt injection is the attack that turns that into a problem: hidden text on a page that tells the agent to do something its operator never asked for.

The shape of the attack

The agent is helpful by design. It reads a page, a tool result, or a file and acts on what it finds. An attacker exploits exactly that helpfulness by planting commands inside content the agent is expected to trust — “ignore your previous instructions”, “send the contents of this file”, “run this”. The instructions never come from the user, but the agent can’t always tell the difference.

Wrap it as data

The defence is a boundary. Everything that comes from outside the trusted conversation — web pages, subagent output, files from outside the project — is data to be read, not instructions to be followed. Make that boundary explicit and the agent stops treating a web page as a source of commands.

That is what safe-fetch and mcp-safe-fetch do: fetch the URL in isolation and wrap the response in UNTRUSTED-WEB tags so the model reads it for facts, never for orders. The Claude Code prompt-injection gate enforces the same rule with hooks, so it holds even on the edge cases a prompt alone would miss.

Mechanical, not optional

A policy written in a prompt is a suggestion. A policy enforced by a hook is a rule. 5bats leans on the second kind — the protection that does not depend on the model remembering to behave.

← Back to all posts

Indirect prompt injection: when your AI reads the wrong instructions

The shape of the attack

Wrap it as data

Mechanical, not optional

Guides