I watched an agent delete production data last month. Not because it was compromised. Not because credentials leaked. Because it did exactly what it was designed to do: read context, make a decision, execute a tool. The context happened to include a poisoned markdown file with embedded instructions. The decision was "delete these temporary files." The tool was a shell command executor with database access. You can guess what happened.
The architecture failed at a fundamental level. The agent had direct execution capability. There was no verification layer between "the LLM decided to do this" and "the system did it." Decision and execution were coupled in the same process with the same permissions. When the decision went wrong, execution immediately followed.
This is the architectural mistake most teams make. They build agents as monolithic systems where the LLM both decides what to do and has the capability to do it. This works in demos. In production, it's a matter of when, not if, something goes catastrophically wrong.
The solution is borrowed from network security: the DMZ pattern. In networking, you never put internet-facing services directly on your internal network. You create a demilitarized zone—a neutral space where traffic is inspected, validated, and filtered before reaching internal systems. The DMZ doesn't trust inbound traffic. It doesn't trust outbound requests. It sits between untrusted and trusted zones, enforcing policy.
We need the same pattern for agents. The agent decides what to do. The DMZ validates those decisions. Only validated actions reach execution. This architectural separation creates a trust boundary that protects against compromised decision-making without sacrificing agent autonomy.
The Fundamental Problem: Trust Without Verification
Most agent architectures look like this: LLM receives input, processes context, generates tool calls, executes tools directly. The LLM has ambient authority—it can execute any tool it has credentials for. There's no intermediate verification. If the LLM decides to call a tool, the tool executes.
This coupling creates a single point of failure. Compromise the LLM's decision-making (through prompt injection, context poisoning, or any other attack vector) and you've compromised execution. The trust model is binary: either you trust the LLM completely or you don't deploy it.
But trust shouldn't be binary. You can trust the LLM to make reasonable decisions most of the time while acknowledging it can be manipulated. You can trust it with read operations but require validation for writes. You can trust it during business hours but enforce stricter controls overnight.
The mistake is architectural coupling. When decision and execution are in the same component, you can't enforce nuanced trust boundaries. You either give the agent capability or you don't. There's no middle ground.
The network security analogy is exact:
Traditional network: Applications run on servers directly connected to the internet. If an application is compromised, attackers have direct access to the internal network.
DMZ architecture: Internet-facing applications run in an isolated network zone. Traffic from the DMZ to internal networks flows through a firewall that enforces policy. Application compromise doesn't automatically grant internal access.
Traditional agent: LLM makes decisions and executes tools directly. If decision-making is compromised, malicious tools execute immediately.
DMZ architecture: LLM makes decisions in one component. Proposed actions flow through a validation layer before reaching execution. Compromised decisions don't automatically execute.
The key insight: separation of concerns creates defensible boundaries.
Decision-making should be untrusted by default. Execution should require explicit authorization. The validation layer sits between them, enforcing policy without direct knowledge of either component's internals.
The invariant to maintain:
No tool executes without passing through validation. No exceptions, no bypass paths, no "trusted" agents that skip validation. Every execution request, from every agent, at every time, must be validated.
This seems expensive and slow. It is, compared to direct execution. But the alternative—trusting LLM decisions unconditionally—is indefensible in production.
DMZ Architecture: Three-Layer Separation
An agent DMZ architecture separates the system into three distinct layers, each with different trust properties and responsibilities.
The Agent DMZ Architecture: Three-Layer Separation (Right click -> Open image in new tab)
Layer 1: Untrusted Zone - Decision Making
This is where the agent lives. It receives user input, processes external context, and generates proposed actions. Everything in this zone is considered untrusted because the LLM's decision-making can be influenced by adversarial inputs.
Responsibilities:
- Process user requests and build context
- Generate tool call proposals with parameters
- Explain reasoning for proposed actions
- Receive execution results and continue workflow
Trust properties:
- No direct execution capability
- No credentials or access to protected resources
- Cannot bypass validation layer
- Outputs are proposals, not commands
Key design decision: The LLM has zero ambient authority. It can propose actions but cannot execute them. This prevents compromised decision-making from directly causing harm.
Layer 2: DMZ - Validation Layer
The neutral zone where proposed actions are inspected, validated, and either approved with scoping or rejected. This layer enforces policy without needing to understand agent internals or execution details.
Responsibilities:
- Validate proposed actions against security policies
- Check parameter scopes and bounds
- Analyze risk based on context and action type
- Generate scoped execution plans for approved actions
- Reject and log policy violations
- Audit all decisions
Trust properties:
- Stateless validation (no memory of past decisions unless explicitly designed)
- Enforces declarative policies, not procedural logic
- Cannot be bypassed by either decision or execution layers
- All traffic flows through validation—no exceptions
Key design decision: Validation is mandatory and automatic. There's no code path that allows an action to skip validation. The DMZ is the only route from decision to execution.
Layer 3: Trusted Zone - Execution Layer
Where validated actions actually execute. This layer has credentials, access to protected resources, and the capability to cause real changes. It trusts the validation layer to only send approved actions.
Responsibilities:
- Manage credentials and secrets
- Execute approved tool calls with scoped permissions
- Enforce execution-time constraints (timeouts, resource limits)
- Return execution results
- Log all operations
Trust properties:
- Only accepts actions from validation layer
- Executes with minimal privileges required for the scoped action
- Enforces execution-time safety regardless of validation approval
- Cannot be called directly by decision layer
Key design decision: Execution trusts validation but still enforces its own constraints. Defense in depth means even approved actions run with resource limits and timeouts.
Information flow:
Decisions flow one direction: untrusted → DMZ → trusted. Results flow back: trusted → DMZ → untrusted. The DMZ inspects both directions. Proposed actions are validated before reaching execution. Execution results are sanitized before returning to the agent (to prevent result poisoning attacks).
Failure isolation:
If the decision layer is compromised, only invalid proposals reach the DMZ where they're rejected. If the execution layer fails, the failure is contained and reported back through the DMZ. If the DMZ itself fails, the system fails closed-no execution happens.
Implementation: Building the Validation Layer
The DMZ's validation layer is the critical component. Here's what it looks like in production.
Action Validator Core
from typing import Dict, Any, List, Optionalfrom dataclasses import dataclassfrom enum import Enumimport timeclass ValidationDecision(Enum): APPROVED = "approved" REJECTED = "rejected" REQUIRES_REVIEW = "requires_review"@dataclassclass ProposedAction: """ Action proposal from the untrusted decision layer. """ action_id: str tool_name: str parameters: Dict[str, Any] agent_id: str context_hash: str # Hash of context that influenced this decision reasoning: str # LLM's explanation for why it chose this action timestamp: float@dataclassclass ValidationResult: """ DMZ validation decision. """ decision: ValidationDecision scoped_action: Optional['ScopedAction'] rejection_reason: Optional[str] risk_score: float policy_checks: Dict[str, bool] validation_time_ms: float@dataclassclass ScopedAction: """ Approved action with execution scoping applied. """ original_action: ProposedAction scoped_parameters: Dict[str, Any] # Modified parameters with safety bounds allowed_credentials: List[str] # Which credentials can be used execution_timeout_ms: int max_cost_dollars: float allowed_failure_retries: intclass AgentDMZ: """ The DMZ validation layer. Sits between decision-making and execution, enforcing policy. """ def __init__(self): self.policy_engine = PolicyEngine() self.scope_checker = ScopeChecker() self.risk_analyzer = RiskAnalyzer() self.audit_logger = AuditLogger() # Performance tracking self.validation_metrics = ValidationMetrics() def validate_action(self, proposed: ProposedAction) -> ValidationResult: """ The DMZ's core function: validate a proposed action. This is the single enforcement point. Every action must pass through here. """ validation_start = time.time() # Step 1: Policy evaluation # Does this action violate any explicit security policies? policy_result = self.policy_engine.evaluate( tool_name=proposed.tool_name, parameters=proposed.parameters, agent_id=proposed.agent_id, context_hash=proposed.context_hash ) if not policy_result.allowed: result = self._reject( proposed=proposed, reason=f"Policy violation: {policy_result.reason}", policy_checks=policy_result.checks ) self._record_validation(proposed, result, validation_start) return result # Step 2: Scope checking # Are the parameters within acceptable bounds? scope_result = self.scope_checker.check_bounds( tool_name=proposed.tool_name, parameters=proposed.parameters ) if not scope_result.valid: result = self._reject( proposed=proposed, reason=f"Parameter out of scope: {scope_result.violation}", policy_checks={"scope": False} ) self._record_validation(proposed, result, validation_start) return result # Step 3: Risk analysis # How risky is this action given current context? risk_result = self.risk_analyzer.analyze( proposed_action=proposed, historical_behavior=self._get_agent_history(proposed.agent_id) ) # High-risk actions require human review if risk_result.risk_score > 0.8: result = ValidationResult( decision=ValidationDecision.REQUIRES_REVIEW, scoped_action=None, rejection_reason=None, risk_score=risk_result.risk_score, policy_checks={"risk_threshold": False}, validation_time_ms=(time.time() - validation_start) * 1000 ) self._record_validation(proposed, result, validation_start) return result # Step 4: Generate scoped execution plan # Even approved actions execute with constraints scoped_action = self._apply_scoping( proposed=proposed, scope_result=scope_result, risk_score=risk_result.risk_score ) result = ValidationResult( decision=ValidationDecision.APPROVED, scoped_action=scoped_action, rejection_reason=None, risk_score=risk_result.risk_score, policy_checks=policy_result.checks, validation_time_ms=(time.time() - validation_start) * 1000 ) self._record_validation(proposed, result, validation_start) return result def _apply_scoping( self, proposed: ProposedAction, scope_result: ScopeCheckResult, risk_score: float ) -> ScopedAction: """ Apply execution scoping based on risk. Higher risk = tighter scoping. """ # Base scoping from scope checker scoped_params = scope_result.scoped_parameters # Adjust based on risk if risk_score > 0.6: # High risk: very tight scoping timeout_ms = 5000 # 5 seconds max max_cost = 0.10 # $0.10 max retries = 0 # No retries elif risk_score > 0.3: # Medium risk: moderate scoping timeout_ms = 15000 # 15 seconds max_cost = 1.00 # $1 max retries = 1 else: # Low risk: relaxed scoping timeout_ms = 30000 # 30 seconds max_cost = 5.00 # $5 max retries = 2 # Determine allowed credentials # Higher risk means more restricted credentials allowed_creds = self._get_scoped_credentials( tool_name=proposed.tool_name, risk_score=risk_score ) return ScopedAction( original_action=proposed, scoped_parameters=scoped_params, allowed_credentials=allowed_creds, execution_timeout_ms=timeout_ms, max_cost_dollars=max_cost, allowed_failure_retries=retries ) def _reject( self, proposed: ProposedAction, reason: str, policy_checks: Dict[str, bool] ) -> ValidationResult: """ Reject an action and log the decision. """ return ValidationResult( decision=ValidationDecision.REJECTED, scoped_action=None, rejection_reason=reason, risk_score=1.0, policy_checks=policy_checks, validation_time_ms=0 ) def _record_validation( self, proposed: ProposedAction, result: ValidationResult, start_time: float ): """ Audit every validation decision. Critical for security investigation and compliance. """ self.audit_logger.log({ "action_id": proposed.action_id, "agent_id": proposed.agent_id, "tool_name": proposed.tool_name, "decision": result.decision.value, "risk_score": result.risk_score, "rejection_reason": result.rejection_reason, "validation_time_ms": result.validation_time_ms, "timestamp": time.time(), "context_hash": proposed.context_hash, "reasoning": proposed.reasoning }) # Track metrics self.validation_metrics.record( decision=result.decision, validation_time=result.validation_time_ms, risk_score=result.risk_score )
Why this architecture works:
Single enforcement point: Every action flows through validate_action. No bypass paths exist in the code.
Stateless validation: Each validation is independent. The DMZ doesn't maintain agent state beyond historical behavior summaries. This prevents state manipulation attacks.
Graduated scoping: Not binary approve/reject. Approved actions execute with constraints proportional to risk. High-risk actions get tight timeouts and cost limits.
Comprehensive audit: Every validation decision is logged with full context. Post-incident analysis can reconstruct why any action was approved or rejected.
Performance tracking: Validation time is measured and tracked. If validation becomes a bottleneck, you'll know.
Policy Engine Implementation
Policies define what's allowed at a declarative level.
class PolicyEngine: """ Evaluates proposed actions against security policies. Policies are declarative rules that don't depend on execution details. """ def __init__(self): self.policies = self._load_policies() def evaluate( self, tool_name: str, parameters: Dict[str, Any], agent_id: str, context_hash: str ) -> PolicyEvaluationResult: """ Check if this action is allowed by policy. """ checks = {} violations = [] for policy in self.policies: if policy.applies_to(tool_name): allowed, reason = policy.evaluate(parameters, agent_id, context_hash) checks[policy.name] = allowed if not allowed: violations.append(reason) return PolicyEvaluationResult( allowed=len(violations) == 0, checks=checks, reason=violations[0] if violations else None ) def _load_policies(self) -> List[Policy]: """ Load security policies. These encode organizational security requirements. """ return [ # No destructive database operations outside business hours TimeBasedPolicy( name="business_hours_db_writes", applies_to_pattern="database_.*", allowed_hours=range(9, 17), # 9am - 5pm allowed_operations=["read"], denied_operations=["write", "delete", "update"] ), # Require human approval for bulk operations ParameterThresholdPolicy( name="bulk_operation_approval", applies_to_pattern=".*", parameter="row_limit", max_without_approval=100 ), # Never allow direct credential access ToolBlacklistPolicy( name="no_credential_tools", denied_tools=["get_credentials", "read_secrets", "access_keys"] ), # External API calls must be to whitelisted domains DomainWhitelistPolicy( name="external_api_whitelist", applies_to_pattern="http_.*|fetch_.*", allowed_domains=["api.company.com", "trusted-partner.com"] ) ]
Policies are declarative and composable. Adding a new security requirement means adding a new policy, not modifying validation logic.
Execution Layer with Scoping
The execution layer trusts the DMZ but enforces its own constraints.
class ExecutionLayer: """ The trusted zone where validated actions actually execute. Enforces execution-time safety even for approved actions. """ def __init__(self): self.credential_manager = CredentialManager() self.tool_registry = ToolRegistry() self.execution_monitor = ExecutionMonitor() def execute(self, scoped_action: ScopedAction) -> ExecutionResult: """ Execute a validated action with scoping constraints. """ # Retrieve scoped credentials credentials = self.credential_manager.get_scoped_credentials( allowed_credentials=scoped_action.allowed_credentials, ttl_seconds=scoped_action.execution_timeout_ms / 1000 ) # Get the tool executor tool = self.tool_registry.get_tool( scoped_action.original_action.tool_name ) # Execute with constraints try: result = self.execution_monitor.execute_with_limits( tool=tool, parameters=scoped_action.scoped_parameters, credentials=credentials, timeout_ms=scoped_action.execution_timeout_ms, max_cost=scoped_action.max_cost_dollars ) return ExecutionResult( success=True, output=result, cost=self.execution_monitor.last_execution_cost, duration_ms=self.execution_monitor.last_execution_duration ) except TimeoutError: return ExecutionResult( success=False, error="Execution timeout exceeded", cost=0, duration_ms=scoped_action.execution_timeout_ms ) except CostLimitExceeded as e: return ExecutionResult( success=False, error=f"Cost limit exceeded: ${e.cost}", cost=e.cost, duration_ms=self.execution_monitor.last_execution_duration ) finally: # Always revoke credentials after execution self.credential_manager.revoke_credentials(credentials)
Even approved actions can't run indefinitely or burn unlimited budget. The execution layer enforces constraints independent of validation decisions.
Pitfalls & Failure Modes
DMZ architectures fail in predictable ways. Here's what breaks in production.
Validation Becomes a Latency Bottleneck
Every action waits for validation. In a conversational agent with multiple tool calls per turn, validation latency compounds. A 50ms validation delay becomes 500ms across 10 tool calls. Users notice the slowdown.
Teams respond by caching validation decisions or creating fast paths for "safe" tools. Both break the DMZ model. Cached validations don't reflect current policy. Fast paths bypass the enforcement point.
Prevention: Validation must be fast—under 20ms at p99. This requires optimized policy evaluation, in-memory policy storage, and efficient risk analysis. You can't compromise on validation, so you must invest in performance.
False Rejections Erode Trust
The DMZ rejects a legitimate action because the policy is too strict or the risk analyzer is too conservative. The agent can't complete the user's task. This happens repeatedly. Engineers start building exceptions, override mechanisms, or "trusted agent" modes that bypass the DMZ.
Any bypass mechanism destroys the architecture. The DMZ only works if it's mandatory.
Prevention: Tune policies based on false rejection rates. Implement graduated responses—high-risk actions require human approval rather than automatic rejection. Make the approval process fast enough that it's tolerable for edge cases.
Policy Drift from Implementation
Security policies are defined in code. Those policies evolve as the organization's risk tolerance changes. But policy updates require code changes, testing, and deployment. Policy evolution lags behind business needs. Engineers implement workarounds that bypass outdated policies.
Prevention: Treat policies as configuration, not code. Store policies in a database or configuration system that can be updated without code deployment. Version policies and track changes with approval workflows.
Execution Layer Trust Violations
The execution layer is supposed to trust the DMZ's validation decisions. But execution layer engineers see tool calls that "shouldn't have been approved" based on their understanding of safety. They add their own validation logic in the execution layer. Now you have two validation layers with potentially conflicting policies.
Prevention: Execution layer enforces execution-time constraints (timeouts, cost limits) but doesn't make allow/deny decisions based on business logic. That's the DMZ's job. Clear separation of concerns prevents duplication and conflicts.
Audit Log Volume Becomes Unmanageable
Every validation decision is logged. Every execution is logged. At scale, this is gigabytes of audit logs daily. Log storage costs exceed compute costs. Searching logs for incident analysis takes hours. Teams reduce logging to manage costs, losing visibility.
Prevention: Structured logging with proper retention tiers. High-detail logs for recent data (7 days), summary logs for medium-term (90 days), aggregated statistics for long-term (1 year+). Separate audit logs (compliance) from operational logs (debugging).
Summary & Next Steps
The Agent DMZ pattern solves the fundamental problem of agents having direct execution authority. By separating decision-making from execution with a mandatory validation layer, you create a defensible architecture that protects against compromised decisions without eliminating agent autonomy.
The three-layer model is straightforward: untrusted decision layer proposes actions, DMZ validates and scopes them, trusted execution layer carries them out with constraints. Each layer has distinct responsibilities and trust properties. Information flows one direction through validation—no bypass paths exist.
The implementation challenge is making validation fast enough not to degrade user experience. Sub-20ms validation latency at p99 requires optimized policy engines, efficient risk analysis, and careful performance engineering. But it's achievable with proper investment.
The operational challenge is policy maintenance. Security policies must evolve with business needs without creating deployment friction. Treating policies as configuration rather than code helps, as does automated policy testing and staged rollouts.
Here's what to build next:
Implement the DMZ first: Don't deploy agents with direct execution capability. Build the validation layer before production rollout. Retrofitting DMZ architecture is harder than building it correctly from the start.
Define explicit policies: Security requirements should be encoded as policies, not implicit in model behavior. Start with conservative policies and loosen based on false rejection rates.
Instrument validation performance: Track validation latency at every percentile. If p99 exceeds 20ms, investigate and optimize. Slow validation creates pressure to bypass it.
Build comprehensive audit infrastructure: Log every validation decision and execution result. You'll need this for incident response, compliance, and policy tuning.
Test policy changes in staging: Policy updates are security-sensitive. Test them thoroughly before production deployment. Track false positive and false negative rates.
The DMZ pattern isn't optional for production agents—it's the minimum viable security architecture. The question is whether you build it before or after your first major incident.
Related Articles
- Zero Trust Agents: Why 'Verify Every Tool Call' Is the Only Defensible Architecture
- Context Sandboxing: How to Prevent Tool Response Poisoning in Agentic Systems
- The Tool Execution Firewall: Pattern-Based Defense for Agent Actions
Follow for more technical deep dives on AI/ML systems, production engineering, and building real-world applications: