Security framework for AI agents

Open-source middleware that detects and defends against AI Agent Traps. Scan inputs, RAG chunks, and outputs before they reach your LLM.

$ npm install @stylusnexus/agentarmor

What it detects

Shipped

Content Injection

Hidden HTML, metadata injection, dynamic cloaking, syntactic masking

Shipped

Behavioural Control

Jailbreak patterns, data exfiltration, unauthorized sub-agent spawning

Shipped

Cognitive State

RAG poisoning, memory poisoning, contextual learning manipulation

Shipped

Semantic Manipulation

Biased framing, oversight evasion, persona hyperstition

Shipped

Multi-turn / Session

Cross-turn split-payload detection via scanSession — instructions chopped across conversation turns

Shipped

ML Classifier

DeBERTa-v3 multi-label model via ONNX. Optional async detection alongside regex patterns.

Planned

Systemic + HITL

Congestion, cascades, collusion, approval fatigue, social engineering

Quick start

import { AgentArmor } from '@stylusnexus/agentarmor';

const armor = new AgentArmor();

// Scan any text before it reaches your LLM
const result = armor.scanSync(userInput);

if (result.threats.length > 0) {
  console.log('Threats detected:', result.threats);
  const safe = armor.sanitize(userInput, result);
}

// Filter RAG chunks before context assembly
const clean = armor.scanRAGChunksSync(chunks)
  .filter(r => r.threats.length === 0);

Eval results (v0.6.0 patterns)

Strictness	Detection Rate	False Positive Rate
Permissive	82.1%	0.0%
Balanced	91.0%	0.0%
Strict	91.0%	0.0%

105 curated samples (67 adversarial, 38 benign) from WASP, HackAPrompt, Greshake et al., and 2025-2026 real-world incidents.

Includes 10 samples from real-world attacks (MCP poisoning, RAG saturation, supply chain injection) that regex does not yet catch, measuring the gap the ML classifier closes. On the original 49 adversarial samples, regex detection is 100% at balanced.