In August 2024, the EU AI Act entered into force. On August 2, 2025, governance rules for general-purpose AI models became applicable. On August 2, 2026 - the full compliance requirements take effect for high-risk AI systems, including transparency obligations and record-keeping mandates.
If your agents touch credit decisioning, employment screening, regulatory reporting, critical infrastructure, or any personally identifiable information at scale, you are already in scope. The question is not whether you need an audit trail. The question is whether the audit trail you have will satisfy a regulator who shows up and asks: "Show me every decision this agent made, the inputs it received, the reasoning it applied, and the human oversight point where a person could have intervened."
Most teams cannot answer that question. Not because they lack logs - they have logs. Because logs are not an audit trail. An audit trail is structured, immutable, correlated across agents, attributed to agent versions, and queryable on demand. A log file is none of these things by default.
What Regulators Actually Require
The EU AI Act's requirements for high-risk AI systems translate into five concrete technical obligations:
Article 9 - Risk Management: Ongoing, evidence-based risk assessment at every stage of deployment. Not a one-time document. An active system.
Article 12 - Record Keeping: All inputs, outputs, and reasoning steps must be logged with sufficient detail to reconstruct agent decision paths after the fact.
Article 13 - Transparency: The system must be interpretable by its deployers. Every output must be traceable to the inputs and model version that produced it.
Article 14 - Human Oversight: Structured intervention points where a human can monitor performance, override decisions, and stop execution. Not a theoretical possibility - a deployed mechanism.
Article 15 - Accuracy and Robustness: Evidence of continuous monitoring and documented response to performance degradation.
These are not soft requirements. Italy's AI Law (Law 132/2025), which entered into force October 2025, established fines of up to EUR 774,685 for violations - penalties that exceed GDPR fines for the most serious cases. The EU AI Act's penalties go further.
The architectural implication: compliance is not a layer you add after you build the system. It is a property of the system's design. Teams that built agents without considering Articles 9, 12, 13, 14, and 15 face structural rework, not documentation updates.
Wrong Way: Logs Are Not an Audit Trail
The most common compliance failure is treating application logs as audit evidence.
# Wrong way: using standard logging as an "audit trail"# This is what most teams have today. It satisfies nothing in Articles 9-15.import logginglogger = logging.getLogger(__name__)def process_invoice_naive(invoice_text: str, user_id: str) -> dict: logger.info(f"Processing invoice for user {user_id}") # PII in logs result = extraction_agent.invoke(invoice_text) logger.info(f"Extraction complete: {result}") # full output in logs enriched = enrichment_agent.invoke(result) logger.info(f"Enrichment complete") return enriched# What this fails on:# Article 12: Logs are mutable - can be rotated, deleted, overwritten.# No integrity hash. No sequence number. Not an audit record.# Article 13: Cannot answer "which model version produced this output?"# Log entries have no model_version or policy_version field.# Article 14: No human oversight record. No reviewer_id. No approval timestamp.# Cannot prove a human could have intervened.# Article 9: No agent registry. Cannot demonstrate ongoing risk management.# GDPR: Raw user_id in logs violates data minimization. PII in result logs# creates a secondary PII store without a legal basis.The structural problem with logs: they are optimized for debugging by engineers, not for evidence by regulators. The fields regulators need - model version, policy version, integrity hash, reviewer identity, data classifications - are not in a standard logging schema. Adding them to log messages is not a solution. Those fields need to be first-class columns in an append-only record store, not substrings in a log line that gets rotated after 30 days.
The Audit Trail Architecture
What I call the Governed Agent Architecture - the compound of agent registry, immutable audit trail, policy gate integration, human oversight interrupts, and fleet telemetry - is what closes the gap between "running agents" and "auditable agents." Each component satisfies different articles. None is sufficient alone.
Immutability - entries cannot be modified after creation. Standard application logs can be overwritten, rotated, or accidentally deleted. Audit records cannot.
Correlation - every record links to a session ID, an agent version, a user identity, and a timestamp. Isolated log lines are not an audit trail. Records that can be joined across agents for a single end-to-end execution are.
Completeness - inputs, outputs, tool calls, policy decisions, halt events, and human intervention points are all captured. A record of the final output without the intermediate reasoning steps does not satisfy Article 12.
Queryability - a regulator's question ("show me all decisions made by this agent on behalf of this user in October") must be answerable in minutes, not days of log archaeology.
# audit_trail.py# Immutable audit trail for agentic systems.# Integrates with the OTel telemetry from Part 1 and policy decisions from Part 2.from __future__ import annotationsimport hashlibimport jsonimport timefrom dataclasses import asdict, dataclass, fieldfrom enum import Enumfrom typing import Any, Optionalclass AuditEventType(Enum): # Agent execution events AGENT_INVOKED = "agent_invoked" AGENT_COMPLETED = "agent_completed" AGENT_HALTED = "agent_halted" # pipeline halt from Part 3 # Policy events (from Part 2 Dual-Layer Gate Model) POLICY_DECISION = "policy_decision" GATE_TRIGGERED = "gate_triggered" # Human oversight events (Article 14) HUMAN_REVIEW_REQUESTED = "human_review_requested" HUMAN_APPROVED = "human_approved" HUMAN_OVERRIDDEN = "human_overridden" HUMAN_REJECTED = "human_rejected" # Tool call events TOOL_CALLED = "tool_called" TOOL_RESULT_RECEIVED = "tool_result_received" # Cost events (from Part 5) BUDGET_WARNING = "budget_warning" BUDGET_HALTED = "budget_halted"@dataclassclass AuditRecord: """ A single immutable audit record. The integrity_hash field makes tampering detectable. """ # Identity record_id: str # unique record identifier session_id: str # ties all records for one pipeline run agent_type: str agent_version: str event_type: str # AuditEventType value # Temporal timestamp_utc: float # Unix timestamp, UTC sequence_number: int # monotonic within session # Content input_summary: Optional[str] # hash or truncated summary - not full PII content output_summary: Optional[str] # hash or truncated summary tool_name: Optional[str] policy_decision: Optional[str] halt_reason: Optional[str] human_actor_id: Optional[str] # if human intervention occurred # Regulatory metadata user_identity_hash: Optional[str] # hashed user ID for GDPR-safe attribution data_classifications: list[str] # ["PII", "FINANCIAL"] - from Part 2 policy model_version: str # model used for this invocation policy_version: str # OPA policy version from Part 2 # Integrity integrity_hash: str = field(init=False) def __post_init__(self) -> None: self.integrity_hash = self._compute_hash() def _compute_hash(self) -> str: """ SHA-256 hash of all record fields. Any modification to the record changes the hash - tampering is detectable. """ record_data = { k: v for k, v in asdict(self).items() if k != "integrity_hash" } canonical = json.dumps(record_data, sort_keys=True, default=str) return hashlib.sha256(canonical.encode()).hexdigest() def verify_integrity(self) -> bool: """Returns True if the record has not been modified since creation.""" expected = self._compute_hash() return expected == self.integrity_hashclass AuditTrailWriter: """ Writes immutable audit records to a durable, append-only store. In production: write to an append-only database (PostgreSQL with insert-only policy, or a dedicated audit log service like AWS CloudTrail, Azure Monitor Logs). Never write to a mutable log file. """ def __init__(self, session_id: str, agent_type: str, agent_version: str, model_version: str, policy_version: str) -> None: self.session_id = session_id self.agent_type = agent_type self.agent_version = agent_version self.model_version = model_version self.policy_version = policy_version self._sequence = 0 self._records: list[AuditRecord] = [] # in production: write to DB, not memory def record( self, event_type: AuditEventType, input_summary: Optional[str] = None, output_summary: Optional[str] = None, tool_name: Optional[str] = None, policy_decision: Optional[str] = None, halt_reason: Optional[str] = None, human_actor_id: Optional[str] = None, user_identity_hash: Optional[str] = None, data_classifications: Optional[list[str]] = None, ) -> AuditRecord: """ Write one audit record. Returns the record for caller inspection. In production: persist to append-only store before returning. """ import uuid self._sequence += 1 record = AuditRecord( record_id=str(uuid.uuid4()), session_id=self.session_id, agent_type=self.agent_type, agent_version=self.agent_version, event_type=event_type.value, timestamp_utc=time.time(), sequence_number=self._sequence, input_summary=input_summary, output_summary=output_summary, tool_name=tool_name, policy_decision=policy_decision, halt_reason=halt_reason, human_actor_id=human_actor_id, user_identity_hash=user_identity_hash, data_classifications=data_classifications or [], model_version=self.model_version, policy_version=self.policy_version, ) self._records.append(record) # replace with DB write in production return recorddef _hash_pii(value: str) -> str: """ One-way hash for PII fields in audit records. Allows correlation ("did this user appear in sessions last October?") without storing raw PII in the audit log. Use a secret salt in production to prevent rainbow table attacks. """ return hashlib.sha256(value.encode()).hexdigest()[:16]Human Oversight as an Architectural Constraint
Article 14 requires "structured intervention points where a human can monitor performance and override decisions." In LangGraph, this maps directly to the interrupt mechanism. But the requirement goes beyond the technical capability: the human oversight must be:
- Reachable - the system routes to a human review queue before executing high-stakes actions
- Logged - human decisions (approve, reject, override) are audit records themselves
- Bounded - a timeout policy exists: what happens if the human doesn't respond within N hours?
# human_oversight.py# Article 14 compliant human oversight integration.# Wraps LangGraph interrupts with audit logging and timeout handling.from dataclasses import dataclassfrom enum import Enumfrom typing import Any, Optionalfrom langgraph.types import interruptclass HumanDecision(Enum): APPROVED = "approved" REJECTED = "rejected" OVERRIDDEN = "overridden" # human modified the agent's proposed action TIMED_OUT = "timed_out" # no human response within the SLA window@dataclassclass HumanReviewRequest: session_id: str agent_type: str proposed_action: dict[str, Any] risk_reason: str # why human review was triggered data_classifications: list[str] timeout_hours: float = 4.0 # SLA for human responsedef request_human_review( request: HumanReviewRequest, audit: AuditTrailWriter,) -> HumanDecision: """ Interrupts the LangGraph pipeline and waits for human decision. Records both the request and the decision in the audit trail. Article 14 compliant: structured intervention point with full audit. """ # Record the review request in the audit trail (Article 12) audit.record( AuditEventType.HUMAN_REVIEW_REQUESTED, input_summary=f"Proposed action: {request.proposed_action.get('tool')}", policy_decision=request.risk_reason, data_classifications=request.data_classifications, ) # LangGraph interrupt: pipeline pauses here, state is checkpointed. # The human review queue receives the interrupt payload. # Execution resumes only after a human responds via the LangGraph API. human_response = interrupt({ "review_request": { "session_id": request.session_id, "agent_type": request.agent_type, "proposed_action": request.proposed_action, "risk_reason": request.risk_reason, "data_classifications": request.data_classifications, "timeout_hours": request.timeout_hours, } }) # Record the human decision (Article 14) decision = HumanDecision(human_response.get("decision", "timed_out")) audit.record( AuditEventType.HUMAN_APPROVED if decision == HumanDecision.APPROVED else AuditEventType.HUMAN_REJECTED, human_actor_id=human_response.get("reviewer_id"), policy_decision=f"Human decision: {decision.value}", ) return decisionThe key requirement: the audit record for a human decision must include the reviewer_id - a traceable identity for the person who made the decision. Anonymous approvals do not satisfy Article 14. The reviewer is accountable for their decision, and the audit trail proves who made it.
Data Classification and PII Handling in Audit Records
Audit records themselves become a PII risk if they store raw inputs and outputs verbatim. A record of "user asked about their account balance and the agent replied with $12,445.00" is a financial data point tied to a user identity.
The correct approach:
- Hash identifiers - store
user_identity_hash(a one-way hash of the user ID) not the user ID or name - Summarize, don't store - store an input hash or a short semantic summary, not the full prompt text
- Tag, don't embed - store data classification tags (
["PII", "FINANCIAL"]) rather than the data itself
This satisfies Article 12's traceability requirement while respecting GDPR's data minimization principle. The audit record proves that PII was processed, what classification it carried, and what policy governed it - without becoming a second vector for PII leakage.
The Agent Registry: Article 9 in Practice
Article 9 requires an ongoing risk management system. For agentic systems, the first concrete deliverable is an agent registry: a live record of every agent deployed, its risk classification, its granted permissions, and its human oversight requirements.
# agent_registry.py# Live agent registry for Article 9 compliance.# Every deployed agent has an entry. The registry is the evidence# that risk management is an ongoing, evidence-based process.from dataclasses import dataclass, fieldfrom datetime import datetimefrom typing import Optionalfrom enum import Enumclass RiskLevel(Enum): MINIMAL = "minimal" # classification, routing - no PII, no financial impact LIMITED = "limited" # extraction, enrichment - PII-adjacent HIGH = "high" # financial, employment, healthcare decisions UNACCEPTABLE = "unacceptable" # prohibited under Article 5@dataclassclass AgentRegistryEntry: """ One entry per deployed agent type + version combination. Required to answer Article 9's "ongoing, evidence-based risk assessment." """ agent_type: str agent_version: str risk_level: RiskLevel # Article 13: transparency model_family: str # e.g. "gpt-4o", "claude-sonnet" model_version: str # exact model version string purpose: str # plain-language description of what this agent does # Article 14: human oversight requires_human_review: bool human_review_trigger: Optional[str] # condition that triggers review human_review_sla_hours: Optional[float] # Article 12: record keeping audit_trail_enabled: bool data_classifications_in_scope: list[str] retention_days: int # how long audit records are kept # Permissions (from Part 2's Dual-Layer Gate Model) allowed_tools: list[str] max_spend_per_request_usd: float # Lifecycle deployed_at: datetime last_reviewed_at: datetime review_due_at: datetime # Article 9 requires periodic re-review deployed_by: str # engineer responsible for this version approved_by: str # platform/security team approval# Example registry entriesAGENT_REGISTRY: list[AgentRegistryEntry] = [ AgentRegistryEntry( agent_type="invoice_extractor", agent_version="1.4.0", risk_level=RiskLevel.LIMITED, model_family="gpt-4o", model_version="gpt-4o-2024-11-20", purpose="Extracts structured fields from invoice documents for accounts payable processing", requires_human_review=True, human_review_trigger="amount > 500000 or vendor_trust_score < 0.4", human_review_sla_hours=4.0, audit_trail_enabled=True, data_classifications_in_scope=["FINANCIAL", "PII"], retention_days=730, # 2 years for financial records allowed_tools=["crm_read", "vendor_lookup"], max_spend_per_request_usd=0.05, deployed_at=datetime(2026, 4, 1), last_reviewed_at=datetime(2026, 4, 1), review_due_at=datetime(2026, 7, 1), # quarterly review deployed_by="engineer@company.com", approved_by="platform-security@company.com", ),]The agent registry answers the question regulators ask: "What AI systems do you operate, who is responsible for them, and when were they last reviewed?" A spreadsheet is not sufficient in 2026. An API-queryable registry that is updated with every deployment is.
Diagram: Compliance-Ready Agentic System Architecture
The diagram shows the full compliance architecture mapped to the EU AI Act articles it satisfies. Every component feeds evidence to the audit trail - the single source of truth a regulator queries.
flowchart TD
subgraph Registry["Agent Registry (Article 9)"]
REG["Agent Type + Version\nRisk Level\nPermission Envelope\nReview Due Date"]
end
subgraph Execution["Agent Execution"]
A["Agent Node\n(versioned)"]
PG["Policy Gate\nOPA Dual-Layer\nPolicy version tracked"]
SV["Semantic\nVerification"]
end
subgraph Oversight["Human Oversight (Article 14)"]
HR["Interrupt\nHuman Review Queue"]
HD{"Decision"}
APR["Approved\nReviewer ID logged"]
REJ["Rejected\nReason logged"]
end
subgraph AuditStore["Immutable Audit Trail (Article 12)"]
AT["Append-Only Store\nIntegrity Hash\nSession Correlation\nTimestamp + Sequence"]
end
subgraph Telemetry["Fleet Telemetry (Article 15)"]
OTel["OTel Collector"]
DASH["Compliance Dashboard\nDrift + Quality Metrics"]
end
REGULATOR["Regulator Query\nArticle 13 Transparency"]
REG -->|"risk classification\ninforms gate thresholds"| PG
A --> PG --> SV
SV -->|"high-risk action"| HR --> HD
HD --> APR & REJ
A -->|"AGENT_INVOKED\nAGENT_COMPLETED"| AT
PG -->|"POLICY_DECISION\nGATE_TRIGGERED"| AT
APR & REJ -->|"HUMAN_APPROVED\nHUMAN_REJECTED\nreviewer_id"| AT
REG -->|"registry snapshot\non change"| AT
A --> OTel --> DASH
DASH -->|"Article 15 evidence\nongoing monitoring"| AT
AT -->|"queryable on demand"| REGULATOR
style REG fill:#9B59B6,color:#fff
style A fill:#4A90E2,color:#fff
style PG fill:#7B68EE,color:#fff
style SV fill:#7B68EE,color:#fff
style HR fill:#FFD93D,color:#333
style HD fill:#FFD93D,color:#333
style APR fill:#6BCF7F,color:#fff
style REJ fill:#E74C3C,color:#fff
style AT fill:#6BCF7F,color:#fff
style OTel fill:#98D8C8,color:#333
style DASH fill:#98D8C8,color:#333
style REGULATOR fill:#FFA07A,color:#333
Every arrow that points to the Audit Trail represents an Article 12 record. The Registry feeds Article 9 evidence. Human oversight arrows carry Article 14 accountability. The telemetry path satisfies Article 15's ongoing monitoring requirement. The regulator node at the bottom has one path to it: the audit trail - which is the only path that matters on inspection day.
Connecting the Full Control Plane
This article closes the AI Control Plane series. The six parts form a layered architecture:
| Part | Layer | What It Governs |
|---|---|---|
| 1 - Unified Observability | Telemetry | What is happening across the fleet |
| 2 - Global Policy Enforcement | Policy | What the fleet is permitted to do |
| 3 - Failure Propagation | Reliability | What happens when an agent fails |
| 4 - Agent Versioning | Deployment | How new versions enter the fleet safely |
| 5 - Cost Governance | Economics | What the fleet is allowed to cost |
| 6 - Compliance and Audit | Accountability | What the fleet must be able to prove |
The Telemetry Surface Gap from Part 1 is the gap the audit trail closes from the compliance side: every agent action that emits an OTel span also generates an audit record. One instrumentation pass serves both operational observability and regulatory accountability.
The Dual-Layer Gate Model from Part 2 generates the policy decision records that satisfy Article 9's evidence requirement. The OPA policy version tracked in every audit record proves that risk management is ongoing - each policy change is versioned, deployed to all agents simultaneously, and recorded.
Compliance Checklist
Audit trail infrastructure:
- Append-only, immutable audit store configured (PostgreSQL insert-only policy, or dedicated service)
-
AuditRecord.integrity_hashcomputed and verified on read - tampering is detectable - Every agent invocation, tool call, policy decision, halt, and human decision generates an audit record
-
session_idcorrelates all records for one end-to-end pipeline execution - PII not stored verbatim - user identities stored as one-way hashes
- Retention policy set: minimum 2 years for financial/employment agents; match your regulatory regime
Agent registry:
- Every deployed agent type + version has a registry entry before receiving production traffic
- Risk level assessed per agent type (Article 9)
- Quarterly review dates set per entry - compliance is ongoing, not one-time
-
approved_byfield populated - platform/security team sign-off required before production
Human oversight:
- Interrupt points implemented for all high-risk actions (Article 14)
- Human review triggers documented per agent type
- Human review SLA defined with timeout behavior: what happens if no response in N hours?
- Human reviewer identity recorded in audit trail - anonymous approvals do not satisfy Article 14
- Human override capability tested and documented
Regulatory mapping:
- Agent types classified under EU AI Act risk tiers (unacceptable / high / limited / minimal)
- High-risk agents assessed against Articles 9-15 explicitly - gaps documented and tracked
- If operating in EU or serving EU nationals: August 2, 2026 compliance deadline active
- Vendor compliance clauses: third-party AI services have contractual commitments matching your obligations
References
- European Commission. EU AI Act - Full Text and Implementation Timeline. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
- CMS Law. Agentic AI, Risk and Compliance Under the EU AI Act. https://cms.law/en/gbr/legal-updates/agentic-ai-and-the-eu-ai-act2
- Artificial Intelligence News. (April 2026). Agentic AI's Governance Challenges Under the EU AI Act in 2026. https://www.artificialintelligence-news.com/news/agentic-ais-governance-challenges-under-the-eu-ai-act-in-2026/
- LegalNodes. (April 2026). EU AI Act 2026 Updates: Compliance Requirements and Business Risks. https://www.legalnodes.com/article/eu-ai-act-2026-updates-compliance-requirements-and-business-risks
- Covasant. (April 2026). EU AI Act Compliance for Autonomous AI Agents in 2026. https://www.covasant.com/blogs/eu-ai-act-compliance-autonomous-agents-enterprise-2026
- Raconteur. (April 2026). EU AI Act Compliance: A Technical Audit Guide for the 2026 Deadline. https://www.raconteur.net/global-business/eu-ai-act-compliance-a-technical-audit-guide-for-the-2026-deadline
- Sombra. (October 2025). An Ultimate Guide to AI Regulations and Governance in 2026. https://sombrainc.com/blog/ai-regulations-2026-eu-ai-act
- NIST. AI Risk Management Framework (AI RMF 1.0). https://www.nist.gov/system/files/documents/2023/01/26/AI RMF 1.0.pdf
- Tamr. Breaking Down the EU Artificial Intelligence Act: What Businesses Need to Know. https://www.tamr.com/blog/breaking-down-the-eu-artificial-intelligence-act-what-businesses-need-to-know-and-do
- Ranjan Kumar. (April 2026). Unified Observability Across Agent Fleets. https://ranjankumar.in/ai-control-plane-unified-observability-agent-fleet
- Ranjan Kumar. (April 2026). Global Policy Enforcement vs. Per-Agent Gate Rules. https://ranjankumar.in/ai-control-plane-global-policy-enforcement-per-agent-gate-rules
- Ranjan Kumar. (April 2026). Multi-Agent Pipeline Orchestration and Failure Propagation. https://ranjankumar.in/ai-control-plane-multi-agent-pipeline-orchestration-failure-propagation
- Ranjan Kumar. (April 2026). Agent Versioning and Deployment Strategies. https://ranjankumar.in/ai-control-plane-agent-versioning-deployment-strategies
- Ranjan Kumar. (April 2026). Cost Governance and Budget Allocation Across Agent Types. https://ranjankumar.in/ai-control-plane-cost-governance-budget-allocation-agent-types
- Ranjan Kumar. (April 2026). Gated Execution: Why Your Agent Should Never Act Without Permission. https://ranjankumar.in/harness-engineering-gated-execution-llm-agents-policy-safety
Related Articles
- Cost Governance and Budget Allocation Across Agent Types: Token Spend Is Infrastructure Spend
- Unified Observability Across Agent Fleets: Building the Control Plane Metric Layer
- Agent Versioning and Deployment Strategies: Shipping Agent Updates Without Breaking Running Pipelines