Guides

GuideMay 6, 2026

Why Your RAG System Is Using the Wrong Retrieval Strategy

A practitioner's guide to vector-based, vectorless, hybrid, corrective, and agentic retrieval architectures.

GuideMay 7, 2026

Why Your RAG Chunks Are Lying to Your Retriever

Chunking strategy is the upstream failure that no retrieval optimization can fix - a practitioner's guide to fixed-size, recursive, semantic, hierarchical, late, and contextual chunking and when each one silently breaks.

GuideMay 10, 2026

Why Your Embeddings Are the Wrong Shape for Your Domain

Your embedding model was trained on the internet. Your documents are not the internet. Here is what that mismatch costs in production and how to fix it.

GuideMay 11, 2026

Why Your Reranker Is the Last Line You Forgot to Build

Retrieval gets you recall. Reranking gets you precision. Skipping it means your LLM reads the wrong documents with complete confidence - and you will not know until production.

GuideMay 11, 2026

Why Your RAG System Cannot Tell When It Is Wrong

Most RAG pipelines measure answer quality and ignore retrieval quality - which means the Retrieval Tax, Chunking Debt, and Precision Gap from Parts 1 through 4 are accumulating invisibly, query by query, until a production incident forces the question.

GuideMay 11, 2026

Why Your Agentic RAG System Costs 10x More Than It Should

Wrapping a well-built RAG pipeline in an agent loop multiplies every cost in the stack by the number of retrieval iterations - and most teams have no budget ceiling, no per-session spend tracking, and no circuit breaker.

Part7

GuideMay 25, 2026

Why Your RAG Knowledge Base Is Lying About What It Knows

The vector index is a snapshot of your documents at ingestion time. Semantic similarity has no relationship to temporal validity. A document from 18 months ago can score 0.94 cosine similarity and still be completely wrong today - and nothing in the standard RAG pipeline raises a flag.

Series

The Claude Code Engineering Playbook

8 parts

Context Engineering: The Skill That Separates Production Agents from Demos

Prompt engineering tells the model what to do. Context engineering determines whether it can actually do it.

Agent Skills Are Not Prompts. They Are Production Knowledge Infrastructure.

Every team is re-teaching their agent the same workflows on every call. Skills are how you stop paying that tax.

Subagents: How to Run Parallelism Inside a Single Agent Session Without Poisoning the Parent

Every subagent burns its own context so the parent doesn't have to. That's the entire architecture.

Hooks: The Enforcement Layer That Turns Agent Policy Into Agent Fact

Prompts suggest. Hooks enforce. Until you know the difference, your agent's safety guarantees are probabilistic.

Which Claude Code Layer Solves Your Problem? A Diagnostic Guide for AI Engineers

Reaching for a subagent when you needed a skill is the most common mistake teams make. Here is how to stop making it.

Four Habits from the Creator of Claude Code That Will Change How You Ship

Boris Cherny runs 10-15 parallel sessions, ships 20-30 PRs a day, and calls his setup 'surprisingly vanilla.' The gap is not configuration. It is operating model.

Part7

GuideApril 30, 2026

You Can't Debug What You Can't See: Observability for Claude Code Sessions

Most Claude Code failures leave no trace. Here is how to build the audit trail that tells you exactly what happened, why it went wrong, and how to stop it happening again.

Part8

GuideApril 30, 2026

How to Know Your Claude Code Setup Actually Works: Testing Beyond the Skill Level

Skill evals tell you a skill works in isolation. They do not tell you whether your agent produces consistently good code. That requires a different kind of test.

Series

AI Control Plane

6 parts

GuideApril 7, 2026

Unified Observability Across Agent Fleets: Building the Control Plane Metric Layer

Teams running agent fleets think they have observability because they have traces. They don't - they have logging. Here's what the difference costs you in production.

Global Policy Enforcement vs. Per-Agent Gate Rules: Two Layers That Must Not Collapse Into One

Treating fleet-wide policy and per-agent gate logic as the same problem is how you end up with governance theater and brittle agents at the same time.

Multi-Agent Pipeline Orchestration and Failure Propagation: Designing for Blast Radius

Retry logic tells an agent what to do when it fails. A pipeline halt protocol tells the entire fleet what to do. Most production systems only have one of these.

Agent Versioning and Deployment Strategies: Shipping Agent Updates Without Breaking Running Pipelines

Deploying a new agent version into a live multi-agent pipeline is not a software deployment. It is a distributed state migration - and most teams treat it like the former.

Cost Governance and Budget Allocation Across Agent Types: Token Spend Is Infrastructure Spend

Most teams discover their agent fleet's true cost on the invoice. By then, three budget cycles of misconfigured pipelines have already run.

Compliance, Audit Trails, and Regulatory Requirements for Agentic Systems

The EU AI Act full enforcement begins August 2, 2026. The gap between running agents and running auditable agents is not a documentation problem. It is an architectural one.

Series

Harness Engineering

8 parts

GuideApril 3, 2026

Harness Engineering: The Missing Layer Between LLMs and Production Systems

Why AI systems don't fail at the model layer - and how designing the right execution harness turns brittle prompts into reliable infrastructure

Normalization and Input Defense: Hardening the Entry Point of Your LLM System

Every unreliable LLM system has a porous entry point. Here's how to build the layer that ensures the model only ever sees clean, controlled, safe input.

Context Engineering: What the Model Sees Is What the Model Does

The Lost in the Middle problem isn't a model bug. It's a context design failure - and fixing it requires treating the context window as managed infrastructure, not a dump bucket.

Gated Execution: Why Your Agent Should Never Act Without Permission

Valid output is not safe output. The Gated Execution layer is the firewall between what the model proposes and what the system actually does - and it's the difference between an agent that assists and one that causes incidents.

Validation Layer Design: Building the Reflex That Catches What the Model Gets Wrong

The model will produce malformed output. Not occasionally - regularly. The Validation Layer is the only thing standing between that malformed output and your downstream systems.

Retry, Fallback, and Circuit Breaking: Building LLM Infrastructure That Survives Outages

Your LLM provider will have an incident. The question is not whether your system fails when that happens - it's whether you designed for it beforehand.

Part7

State Management for Agentic Systems: How to Build Agents That Don't Start Over

A long-running agent without state management is a gamble. You're betting the entire task completes before something goes wrong. At production scale, that bet loses constantly.

Part8

Deterministic Constraint Systems: Building Tool Registries That Keep Agents in Scope

The model will try to use tools it doesn't have. It will call APIs with parameters that don't exist. It will invent capabilities. The constraint system is how you make the gap between what the model thinks it can do and what it can actually do exactly zero.

Series

From Unknown Codebase to Architecture Document

2 parts

GuideMarch 12, 2026

From Unknown Codebase to Architecture Doc, Automated - Building the LangGraph Pipeline

How ArchLens - a 12-node LangGraph pipeline - turns any Git repository into a validated architecture document: state design, chunking logic, all four validation gates, human-in-the-loop review, and production-ready error recovery