Skip to content

Introduction

Stronghold is a pay-per-request AI security scanning platform that protects AI agents from prompt injection attacks and credential leaks. It is built on the Citadel AI security scanner and uses the x402 HTTP payment protocol for frictionless, per-request billing.

The Problem

AI agents interact with untrusted content from the internet — web pages, API responses, user messages, tool outputs. This content can contain prompt injection attacks that hijack the agent’s behavior, or the agent itself can inadvertently leak credentials and sensitive data in its responses.

Traditional security approaches require the agent to call a scanning API after it has already read the content. But if the content contains a prompt injection, the agent may have already been compromised before it gets a chance to check.

How Stronghold Solves It

Stronghold provides bidirectional protection at the network level:

External content -> [PROXY scans] -> Agent -> [API scans] -> Output
| |
Injection Credential leaks
(INCOMING) (OUTGOING)
  • Incoming content is scanned for prompt injection by the transparent proxy before it reaches the agent. The agent never sees malicious content.
  • Outgoing responses are scanned for credential leaks by the API before they leave the system.

This two-way architecture means attacks are caught at the perimeter, not inside the agent’s reasoning loop.

Components

Stronghold has three components:

ComponentDescription
API ServerGo/Fiber HTTP service that performs 4-layer security scanning. Accepts x402 payments per request.
CLI ClientCobra/Bubbletea command-line tool for system setup, proxy management, wallet operations, and account management.
Transparent ProxyNetwork-level traffic interceptor that routes all HTTP/HTTPS traffic through Stronghold scanning before it reaches the agent.

4-Layer Scanning

Every scan request passes through up to four detection layers, from fastest to most thorough:

  1. Heuristic — Pattern matching against known injection signatures. Sub-millisecond.
  2. ML Classification — Citadel/Hugot neural network classifier. ~5ms.
  3. Semantic Similarity — Embedding-based comparison against known attack patterns. ~10ms.
  4. LLM Analysis (optional) — Any OpenRouter-compatible LLM reasoning about intent. ~500ms.

Each layer produces a score. The scores combine into a final decision: ALLOW, WARN, or BLOCK.

Open Source and Self-Hostable

Stronghold is MIT licensed. You can use the hosted service at api.getstronghold.xyz or deploy the entire stack yourself. See the Self-Hosting guide for details.

Next Steps