← Back to Blog

Consequence Modeling for Agent Systems: Predicting Action Impact Before Execution

#consequence-modeling#impact-prediction#agent-safety#dry-run-execution#risk-assessment#blast-radius#rollback-mechanisms#production-ai#decision-support

An agent decided to clean up old database records. It identified 50,000 "inactive" user accounts based on a 90-day activity threshold. It executed DELETE FROM users WHERE last_activity < NOW() - INTERVAL 90 DAY. The query ran successfully. We realized 10 minutes later that "inactive" users included enterprise customers on annual contracts who authenticate once per quarter. We'd just deleted 12,000 paying customers.

The agent had the permission to delete. The logic seemed reasonable. The SQL was syntactically correct. But we had no mechanism to predict consequences before execution. No simulation of "what happens if this runs." No estimation of blast radius. No dry-run validation showing which specific records would be affected.

Post-incident analysis was straightforward: if we'd shown the agent "this will delete 50,000 users including 12,000 with active subscriptions," it would have reconsidered. Or we would have caught it in review. But we only logged what the agent did, not what it was about to do.

This is the fundamental gap in agent architectures. Agents propose actions based on reasoning over context. Those actions execute immediately if authorized. There's no intermediate step modeling consequences—predicting impact, estimating scope, simulating outcomes. We treat agents like deterministic systems where code review catches problems. But agent decisions are non-deterministic. You can't review them in advance. You need runtime consequence prediction.

The solution is consequence modeling: computational systems that predict action impact before execution. Not philosophical "awareness" but engineering—simulation engines, impact estimators, dry-run validators, rollback analyzers. Build decision support that shows agents (and humans) what will actually happen if an action executes, then use that prediction to inform go/no-go decisions.

The Prediction Problem: Actions Have Hidden Consequences

Traditional systems have predictable consequences. A function call with specific parameters produces deterministic outputs. Code paths are traceable. Side effects are documented. You can reason about impact through static analysis and testing.

Agents break this model in three ways:

Operations are context-dependent. The same tool call has different consequences based on runtime state. DELETE FROM users WHERE inactive=true might delete 10 users or 10,000 depending on current database state. The agent can't know without querying first.

Scope emerges from parameters. An agent decides to "clean up temporary files" and proposes rm -rf /tmp/*. Seems reasonable. But /tmp contains active session data for 1,000 concurrent users. The scope of impact (all active sessions) isn't visible from the command alone.

Cascading effects are opaque. Deleting a user triggers foreign key cascades deleting their orders, sessions, preferences, and audit logs. The agent sees "delete user" not "delete user plus 47 related records across 8 tables." Cascading consequences are invisible until execution.

The correct mental model: Actions as predictions with confidence intervals

Traditional execution: "Run this operation" → It runs → Results appear

Consequence modeling: "Run this operation" → Predict impact → Estimate confidence → Show prediction → Decide whether to proceed

The shift is from immediate execution to prediction-informed execution. Before running an action, model what will happen. How many records affected? What cascade effects? What's the estimated cost? What can be rolled back? Use these predictions to make better decisions about whether to execute.

Key insight: You can't prevent all bad decisions, but you can prevent uninformed decisions

An agent might still decide to delete those 50,000 users after seeing the prediction. But it won't do it accidentally because it didn't know the scope. Consequence modeling doesn't eliminate risk—it makes risk visible and quantifiable.

The invariant to maintain:

No destructive operation executes without first predicting and logging its consequences. The prediction becomes part of the decision process and the audit trail.

Consequence Modeling Architecture

A consequence modeling system intercepts proposed actions, simulates their execution, predicts impact across multiple dimensions, and surfaces predictions for decision-making.

graph TB
    A[Agent Proposes Action] --> B[Consequence Modeler]
    
    B --> C[Action Analyzer]
    C --> D[Extract Operation Type]
    C --> E[Parse Parameters]
    C --> F[Identify Target Resources]
    
    B --> G[Impact Simulator]
    G --> H[Dry-Run Executor]
    G --> I[Cascade Predictor]
    G --> J[Cost Estimator]
    
    H --> K[Simulation Results]
    I --> K
    J --> K
    
    K --> L[Impact Report]
    L --> M[Records Affected]
    L --> N[Cascade Effects]
    L --> O[Estimated Cost]
    L --> P[Rollback Feasibility]
    L --> Q[Risk Score]
    
    L --> R{Risk Assessment}
    
    R -->|Low Risk| S[Auto-Execute]
    R -->|Medium Risk| T[Show Prediction to Agent]
    R -->|High Risk| U[Require Human Approval]
    R -->|Critical Risk| V[Block Execution]
    
    T --> W{Agent Confirms?}
    W -->|Yes| S
    W -->|No| X[Cancel]
    
    U --> Y[Human Review]
    Y --> S
    Y --> X
    
    S --> Z[Execute with Monitoring]
    Z --> AA[Actual Results]
    
    AA --> AB[Compare Prediction vs Actual]
    AB --> AC[Update Prediction Models]
    
    AD[Prediction Accuracy Metrics] -.-> AB
    AE[Audit Trail] -.-> L
    AE -.-> AA
    
    style A fill:#7B68EE,stroke:#5A4CB8,color:#fff
    style B fill:#FFD93D,stroke:#CCB030,color:#333
    style C fill:#98D8C8,stroke:#6FB8A8,color:#333
    style G fill:#98D8C8,stroke:#6FB8A8,color:#333
    style H fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style I fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style J fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style K fill:#9B59B6,stroke:#7D3C98,color:#fff
    style L fill:#6BCF7F,stroke:#4BA563,color:#fff
    style M fill:#98D8C8,stroke:#6FB8A8,color:#333
    style N fill:#98D8C8,stroke:#6FB8A8,color:#333
    style O fill:#98D8C8,stroke:#6FB8A8,color:#333
    style P fill:#98D8C8,stroke:#6FB8A8,color:#333
    style Q fill:#98D8C8,stroke:#6FB8A8,color:#333
    style R fill:#E74C3C,stroke:#C0392B,color:#fff
    style S fill:#6BCF7F,stroke:#4BA563,color:#fff
    style T fill:#FFA07A,stroke:#CC7F62,color:#fff
    style U fill:#FF6B6B,stroke:#CC5555,color:#fff
    style V fill:#E74C3C,stroke:#C0392B,color:#fff
    style Z fill:#95A5A6,stroke:#7B8D8E,color:#fff
    style AA fill:#4A90E2,stroke:#2E5C8A,color:#fff

Component responsibilities:

Consequence Modeler: Intercepts all proposed actions before execution. Routes them through prediction pipeline.

Action Analyzer: Parses proposed operation to understand type (read/write/delete), parameters, and target resources.

Impact Simulator: Runs prediction engines to estimate consequences without executing.

Dry-Run Executor: Executes operation in simulation mode (transactions without commit, database EXPLAIN queries, etc.).

Cascade Predictor: Analyzes foreign keys, triggers, webhooks—anything that causes secondary effects.

Cost Estimator: Predicts monetary cost, latency, resource consumption.

Impact Report: Structured prediction showing all dimensions of consequence—scope, cascades, cost, rollback feasibility, risk.

Risk Assessment: Categorizes predicted impact into risk levels determining handling.

Prediction vs Actual Comparison: After execution, compares predictions to actual outcomes. Feeds back into model improvement.

Key architectural properties:

Interception before execution: All actions flow through consequence modeling. No bypass paths.

Multi-dimensional prediction: Scope, cascades, cost, rollback—comprehensive impact view.

Risk-based routing: Low-risk proceeds automatically, high-risk requires approval, critical-risk blocks.

Continuous learning: Prediction accuracy improves over time by comparing predictions to actual outcomes.

Audit trail: Both predictions and actual results logged. Post-incident analysis shows what was predicted vs what happened.

Implementation: Building Consequence Predictors

Here's what consequence modeling looks like in production for different operation types.

Database Operation Consequence Modeling

code
from dataclasses import dataclassfrom typing import Dict, Any, List, Optionalfrom enum import Enumimport sqlparseimport reclass OperationType(Enum):    SELECT = "select"    INSERT = "insert"    UPDATE = "update"    DELETE = "delete"    DDL = "ddl"@dataclassclass DatabaseConsequence:    """Predicted consequences of database operation."""    operation_type: OperationType    affected_tables: List[str]    estimated_rows_affected: int    cascade_operations: List[Dict[str, Any]]    rollback_feasible: bool    estimated_duration_ms: float    risk_score: float    warning_messages: List[str]class DatabaseConsequenceModeler:    """    Predicts consequences of database operations before execution.    """        def __init__(self, db_connection):        self.db = db_connection        self.schema_inspector = SchemaInspector(db_connection)        def predict_consequences(        self,        sql_query: str,        parameters: Optional[Dict] = None    ) -> DatabaseConsequence:        """        Predict what will happen if this SQL executes.        """        # Parse query to understand operation        parsed = sqlparse.parse(sql_query)[0]        op_type = self._identify_operation_type(parsed)                # Get affected tables        tables = self._extract_tables(parsed)                # Dry-run to estimate affected rows        estimated_rows = self._estimate_affected_rows(sql_query, parameters)                # Predict cascade effects        cascades = self._predict_cascades(tables, op_type)                # Check rollback feasibility        rollback_ok = self._check_rollback_feasibility(op_type, estimated_rows)                # Estimate duration        duration = self._estimate_duration(sql_query, estimated_rows)                # Calculate risk score        risk = self._calculate_risk_score(            op_type, estimated_rows, len(cascades), rollback_ok        )                # Generate warnings        warnings = self._generate_warnings(            op_type, estimated_rows, cascades, tables        )                return DatabaseConsequence(            operation_type=op_type,            affected_tables=tables,            estimated_rows_affected=estimated_rows,            cascade_operations=cascades,            rollback_feasible=rollback_ok,            estimated_duration_ms=duration,            risk_score=risk,            warning_messages=warnings        )        def _estimate_affected_rows(        self,        query: str,        parameters: Optional[Dict]    ) -> int:        """        Estimate how many rows will be affected without executing.        Uses EXPLAIN or dry-run query transformation.        """        # For DELETE/UPDATE, transform to SELECT COUNT(*)        if 'DELETE' in query.upper():            # DELETE FROM users WHERE inactive=true            # -> SELECT COUNT(*) FROM users WHERE inactive=true            count_query = re.sub(                r'DELETE\s+FROM\s+(\w+)\s+WHERE\s+(.+)',                r'SELECT COUNT(*) FROM \1 WHERE \2',                query,                flags=re.IGNORECASE            )        elif 'UPDATE' in query.upper():            # UPDATE users SET active=false WHERE last_login < ...            # -> SELECT COUNT(*) FROM users WHERE last_login < ...            count_query = re.sub(                r'UPDATE\s+(\w+)\s+SET\s+.+?\s+WHERE\s+(.+)',                r'SELECT COUNT(*) FROM \1 WHERE \2',                query,                flags=re.IGNORECASE            )        else:            return 0                # Execute count query        result = self.db.execute(count_query, parameters or {})        return result.fetchone()[0]        def _predict_cascades(        self,        tables: List[str],        operation: OperationType    ) -> List[Dict[str, Any]]:        """        Predict cascade effects from foreign keys and triggers.        """        cascades = []                for table in tables:            # Get foreign key relationships            fk_relationships = self.schema_inspector.get_foreign_keys(table)                        for fk in fk_relationships:                if fk['on_delete'] == 'CASCADE' and operation == OperationType.DELETE:                    cascades.append({                        'type': 'foreign_key_cascade',                        'source_table': table,                        'target_table': fk['referred_table'],                        'action': 'DELETE',                        'description': f"Deleting from {table} will cascade delete to {fk['referred_table']}"                    })                elif fk['on_update'] == 'CASCADE' and operation == OperationType.UPDATE:                    cascades.append({                        'type': 'foreign_key_cascade',                        'source_table': table,                        'target_table': fk['referred_table'],                        'action': 'UPDATE',                        'description': f"Updating {table} will cascade update to {fk['referred_table']}"                    })                        # Check for triggers            triggers = self.schema_inspector.get_triggers(table)            for trigger in triggers:                cascades.append({                    'type': 'trigger',                    'table': table,                    'trigger_name': trigger['name'],                    'description': f"Trigger {trigger['name']} will execute on {operation.value}"                })                return cascades        def _calculate_risk_score(        self,        operation: OperationType,        rows_affected: int,        cascade_count: int,        rollback_feasible: bool    ) -> float:        """        Calculate risk score 0.0-1.0 based on predicted impact.        """        score = 0.0                # Base risk by operation type        if operation == OperationType.DELETE:            score += 0.4        elif operation == OperationType.UPDATE:            score += 0.3        elif operation == OperationType.DDL:            score += 0.5                # Scale risk        if rows_affected > 10000:            score += 0.3        elif rows_affected > 1000:            score += 0.2        elif rows_affected > 100:            score += 0.1                # Cascade risk        score += min(cascade_count * 0.1, 0.3)                # Rollback feasibility        if not rollback_feasible:            score += 0.2                return min(score, 1.0)        def _generate_warnings(        self,        operation: OperationType,        rows_affected: int,        cascades: List[Dict],        tables: List[str]    ) -> List[str]:        """        Generate human-readable warnings about consequences.        """        warnings = []                if rows_affected > 10000:            warnings.append(                f"⚠️  Large impact: {rows_affected:,} rows will be affected"            )                if operation == OperationType.DELETE and rows_affected > 0:            warnings.append(                f"⚠️  Destructive: {rows_affected:,} rows will be permanently deleted"            )                if cascades:            cascade_tables = set(c.get('target_table', c.get('table')) for c in cascades)            warnings.append(                f"⚠️  Cascade effects: {len(cascades)} operations on {len(cascade_tables)} related tables"            )                # Check for sensitive tables        sensitive_patterns = ['user', 'customer', 'payment', 'order']        for table in tables:            if any(pattern in table.lower() for pattern in sensitive_patterns):                warnings.append(                    f"⚠️  Sensitive data: Operation affects {table} table"                )                return warnings    def _identify_operation_type(self, parsed_query) -> OperationType:        """Identify SQL operation type from parsed query."""        first_token = parsed_query.token_first(skip_ws=True, skip_cm=True)        if first_token:            op = first_token.value.upper()            if op == 'SELECT':                return OperationType.SELECT            elif op == 'INSERT':                return OperationType.INSERT            elif op == 'UPDATE':                return OperationType.UPDATE            elif op == 'DELETE':                return OperationType.DELETE            elif op in ('CREATE', 'ALTER', 'DROP'):                return OperationType.DDL        return OperationType.SELECT        def _extract_tables(self, parsed_query) -> List[str]:        """Extract table names from parsed SQL."""        # Simplified extraction - production would be more robust        tables = []        for token in parsed_query.tokens:            if isinstance(token, sqlparse.sql.Identifier):                tables.append(token.get_real_name())        return tables        def _check_rollback_feasibility(        self,        operation: OperationType,        rows_affected: int    ) -> bool:        """Check if operation can be rolled back."""        # DDL operations often can't be rolled back        if operation == OperationType.DDL:            return False        # Very large operations may be impractical to rollback        if rows_affected > 100000:            return False        return True        def _estimate_duration(self, query: str, rows: int) -> float:        """Estimate query execution time."""        # Use EXPLAIN or historical query stats        # Simplified: assume 0.1ms per row        return rows * 0.1

Why this works:

Dry-run prediction: Transforms destructive queries to SELECT COUNT(*) to estimate scope without executing.

Cascade analysis: Inspects foreign keys and triggers to predict secondary effects.

Risk scoring: Combines multiple factors (operation type, scale, cascades) into quantitative risk.

Human-readable warnings: Translates technical predictions into actionable warnings.

File System Operation Consequence Modeling

code
import osimport shutilfrom pathlib import Path@dataclassclass FileSystemConsequence:    """Predicted consequences of file system operation."""    operation: str    paths_affected: List[str]    total_files: int    total_size_bytes: int    critical_files_affected: List[str]    rollback_feasible: bool    estimated_duration_seconds: float    risk_score: float    warnings: List[str]class FileSystemConsequenceModeler:    """    Predicts consequences of file system operations.    """        def __init__(self):        self.critical_paths = [            '/etc', '/var/log', '/home', '/usr/bin',            '/System', 'C:\\Windows', 'C:\\Program Files'        ]        def predict_consequences(        self,        operation: str,        path: str,        recursive: bool = False    ) -> FileSystemConsequence:        """        Predict what will happen if this file operation executes.        """        affected_paths = self._enumerate_affected_paths(path, recursive)                total_files = len([p for p in affected_paths if os.path.isfile(p)])        total_size = sum(            os.path.getsize(p) for p in affected_paths            if os.path.isfile(p)        )                critical_files = self._identify_critical_files(affected_paths)                rollback_ok = self._check_rollback_feasibility(            operation, total_files, total_size        )                duration = self._estimate_duration(operation, total_files, total_size)                risk = self._calculate_risk(            operation, total_files, len(critical_files), path        )                warnings = self._generate_warnings(            operation, total_files, total_size, critical_files, path        )                return FileSystemConsequence(            operation=operation,            paths_affected=affected_paths[:100],  # Limit for display            total_files=total_files,            total_size_bytes=total_size,            critical_files_affected=critical_files,            rollback_feasible=rollback_ok,            estimated_duration_seconds=duration,            risk_score=risk,            warnings=warnings        )        def _enumerate_affected_paths(        self,        path: str,        recursive: bool    ) -> List[str]:        """        List all paths that will be affected.        """        path_obj = Path(path)                if not path_obj.exists():            return []                if path_obj.is_file():            return [str(path_obj)]                if recursive:            # Walk directory tree            paths = []            for root, dirs, files in os.walk(path):                paths.extend([os.path.join(root, f) for f in files])                paths.extend([os.path.join(root, d) for d in dirs])            return paths        else:            # Just immediate children            return [str(p) for p in path_obj.iterdir()]        def _identify_critical_files(self, paths: List[str]) -> List[str]:        """Identify paths that are critical system files."""        critical = []        for path in paths:            if any(path.startswith(cp) for cp in self.critical_paths):                critical.append(path)        return critical        def _calculate_risk(        self,        operation: str,        file_count: int,        critical_count: int,        path: str    ) -> float:        """Calculate risk score for file operation."""        score = 0.0                # Base risk by operation        if operation in ('delete', 'rm'):            score += 0.5        elif operation in ('move', 'mv'):            score += 0.3                # Scale risk        if file_count > 1000:            score += 0.3        elif file_count > 100:            score += 0.2                # Critical files        if critical_count > 0:            score += 0.4                # System paths        if any(path.startswith(cp) for cp in self.critical_paths):            score += 0.3                return min(score, 1.0)        def _generate_warnings(        self,        operation: str,        file_count: int,        total_size: int,        critical_files: List[str],        path: str    ) -> List[str]:        """Generate warnings about file operation."""        warnings = []                if file_count > 1000:            warnings.append(                f"⚠️  Large operation: {file_count:,} files affected"            )                if total_size > 1_000_000_000:  # 1GB            warnings.append(                f"⚠️  Large data: {total_size / 1_000_000_000:.2f}GB will be affected"            )                if critical_files:            warnings.append(                f"⚠️  Critical files: {len(critical_files)} system files affected"            )            if len(critical_files) <= 5:                for cf in critical_files:                    warnings.append(f"    - {cf}")                if operation in ('delete', 'rm'):            warnings.append(                f"⚠️  Irreversible: Files will be permanently deleted"            )                return warnings        def _check_rollback_feasibility(        self,        operation: str,        files: int,        size: int    ) -> bool:        """Check if operation can be rolled back."""        if operation in ('delete', 'rm'):            # Can't rollback deletion            return False        if size > 10_000_000_000:  # 10GB            # Too large to practically rollback            return False        return True        def _estimate_duration(        self,        operation: str,        files: int,        size: int    ) -> float:        """Estimate operation duration."""        # Rough estimates        if operation in ('delete', 'rm'):            return files * 0.001  # 1ms per file        elif operation in ('copy', 'cp'):            return size / 100_000_000  # 100MB/s        return files * 0.01

Key patterns:

Pre-enumeration: List all affected paths before executing to show exact scope.

Critical file detection: Identify system files that should never be touched.

Size-based risk: Large operations (1000+ files, 1GB+ data) automatically increase risk.

Irreversibility detection: Operations that can't be undone (delete) flagged explicitly.

Pitfalls & Failure Modes

Consequence modeling systems fail in production through predictable patterns.

Prediction Inaccuracy Breeds Distrust

Consequence modeler predicts "50 rows affected." Actual execution affects 5,000 rows. This happens repeatedly. Users stop trusting predictions. They ignore warnings. A critical prediction is dismissed as another false alarm. Disaster ensues.

Why it happens: Prediction models don't account for edge cases, data distributions change, cascade effects are underestimated.

Prevention: Track prediction accuracy. Compare predicted vs actual for every operation. Retrain prediction models when accuracy drops below 80%. Surface confidence intervals, not point estimates.

Dry-Run Execution Has Side Effects

"Dry-run" database query uses transaction that locks tables. Locks aren't released promptly. Production queries block waiting for dry-run to complete. System grinds to halt.

Why it happens: Even read-only operations can have locking side effects depending on isolation levels and database implementation.

Prevention: Dry-runs must use snapshot isolation or read-uncommitted where possible. Set aggressive timeouts (500ms max for dry-runs). Monitor for lock contention.

Prediction Latency Kills User Experience

Consequence modeling adds 500ms to every operation. Agent workflow becomes unbearably slow. Users disable consequence modeling to restore performance. System is less safe than before implementation.

Why it happens: Comprehensive prediction (cascade analysis, cost estimation, etc.) requires multiple database queries and computations.

Prevention: Prediction must complete in <100ms at p95. Use caching for schema information. Parallelize prediction steps. For low-risk operations, use lightweight fast-path prediction.

False Negatives Create False Security

Consequence modeler predicts low risk for operation that causes catastrophic failure. Team trusts the prediction. Operation executes. Disaster happens. Post-incident shows modeler missed critical cascade effect.

Why it happens: Prediction models can't account for every possible interaction, especially custom application logic, external webhooks, or undocumented dependencies.

Prevention: Consequence modeling is defense-in-depth, not silver bullet. Maintain other safety mechanisms (backups, rollback capabilities, monitoring). Never disable existing safeguards because consequence modeling exists.

Prediction Report Information Overload

Consequence report shows 50 warnings, 20 cascade effects, 15 related tables. Agent (or human reviewer) can't parse the information. Either ignores it all or gets paralyzed by detail. Neither outcome improves safety.

Why it happens: Comprehensive prediction generates lots of data. Presenting all of it overwhelms decision-makers.

Prevention: Graduated detail levels. Show critical warnings prominently (max 3). Hide technical details in expandable sections. Use visual indicators (risk scores, color coding). Optimize for decision-making, not completeness.

Summary & Next Steps

Consequence modeling makes agent actions safer by predicting impact before execution. Traditional agent architectures execute operations immediately if authorized. Consequence modeling adds a prediction layer: simulate execution, estimate scope, predict cascades, calculate risk, show predictions, then decide whether to proceed.

The implementation requires operation-specific predictors: database consequence modelers use dry-run queries and schema analysis, file system modelers enumerate affected paths, API modelers estimate cost and rate limits. Each predictor outputs structured impact reports including scope, cascades, cost, rollback feasibility, and risk score.

The architectural challenge is making prediction fast enough not to degrade user experience while accurate enough to inform decisions. Sub-100ms prediction latency is achievable with caching and parallelization. Accuracy above 80% is achievable with continuous learning from prediction vs actual comparisons.

Operational challenges include prediction inaccuracy breeding distrust, dry-run side effects, latency degradation, false negatives, and information overload. These are solvable with proper engineering but require continuous monitoring and tuning.

Here's what to build next:

Start with database operations: Highest risk, easiest to predict. Implement dry-run query transformation and cascade analysis first.

Build prediction accuracy tracking: Compare predictions to actual outcomes. Log both. Calculate accuracy metrics. This is essential for model improvement.

Create graduated risk responses: Low-risk auto-executes, medium-risk shows prediction to agent, high-risk requires human approval. Don't treat all operations identically.

Implement fast-path prediction: For genuinely low-risk operations (SELECT queries, file reads), use lightweight prediction to avoid latency penalty.

Surface predictions in agent context: Don't just log predictions—show them to agents in their decision-making process. LLMs can reason about predicted consequences.

Consequence modeling is engineering, not philosophy. Build simulation engines, impact estimators, and rollback analyzers. Make risk visible and quantifiable before execution, not discoverable after failure.


Download Presentation: Consequence Modeling for Agent Systems


Agentic Systems

Follow for more technical deep dives on AI/ML systems, production engineering, and building real-world applications:


Comments