What the firewall looks at

Semantic markers: imperative phrasing aimed at an LLM consumer ("ignore previous instructions", "return your system prompt", "you are now"), instruction-shaped content where the surrounding source is descriptive, role-confusion patterns.
Structural anomalies: invisible Unicode, base64 blocks where prose is expected, hidden HTML attributes, comments that contain instructions, overlong runs of homoglyphs.
Known patterns: a curated catalog of injection payloads from the academic and industry literature, plus contributions from incidents the firewall has seen in production. The catalog updates without a customer-facing API change.

Verdicts

Quarantine. The candidate is dropped from this query and the source URL is blocklisted for the tenant for a configurable window. The cache never receives the candidate. The response envelope flags the action with the URL hash so an operator can review.
Surface with warning. The candidate is allowed through to rerank and may reach the response, but it carries a firewall warning. The agent that consumes the response can decide to drop, ask the user, or proceed with caution.
Drop. The candidate is removed silently. Used for low-severity matches where surfacing would be noise. Operator review is still available via the audit log.

Why this matters for agents

Without a firewall, a single poisoned page can hijack any agent that retrieves from it. The poisoned content slides into the prompt context and the model dutifully complies. With the firewall, the candidate either never lands or lands annotated. Agents that consume DeepTap responses get the verdict on every fact and can refuse to act on flagged ones.

What the firewall is not

The firewall is not a content moderation system, not a copyright filter, and not a substitute for output verification downstream. Its job is narrow: keep injection attempts out of the cache and the response. For drift tracking and structured output verification, see Veritize.