Security framework for AI agents
Open-source middleware that detects and defends against AI Agent Traps. Scan inputs, RAG chunks, and outputs before they reach your LLM.
What it detects
Content Injection
Hidden HTML, metadata injection, dynamic cloaking, syntactic masking
Behavioural Control
Jailbreak patterns, data exfiltration, unauthorized sub-agent spawning
Cognitive State
RAG poisoning, memory poisoning, contextual learning manipulation
Semantic Manipulation
Biased framing, oversight evasion, persona hyperstition
ML Classifier
DeBERTa-v3 multi-label model via ONNX. Optional async detection alongside regex patterns.
Systemic + HITL
Congestion, cascades, collusion, approval fatigue, social engineering
Quick start
import { AgentArmor } from '@stylusnexus/agentarmor';
const armor = new AgentArmor();
// Scan any text before it reaches your LLM
const result = armor.scanSync(userInput);
if (result.threats.length > 0) {
console.log('Threats detected:', result.threats);
const safe = armor.sanitize(userInput, result);
}
// Filter RAG chunks before context assembly
const clean = armor.scanRAGChunksSync(chunks)
.filter(r => r.threats.length === 0);
Eval results (v0.4.0 patterns)
| Strictness | Detection Rate | False Positive Rate |
|---|---|---|
| Permissive | 79.7% | 0.0% |
| Balanced | 89.8% | 0.0% |
| Strict | 89.8% | 0.0% |
86 curated samples (59 adversarial, 27 benign) from WASP, HackAPrompt, Greshake et al., and 2025-2026 real-world incidents.
Includes 10 samples from real-world attacks (MCP poisoning, RAG saturation, supply chain injection) that regex does not yet catch, measuring the gap the ML classifier closes. On the original 49 adversarial samples, regex detection is 100% at balanced.