Which Claude Code Layer Solves Your Problem? A Diagnostic Guide for AI Engineers

Your agent keeps using GitHub instead of Azure DevOps. You add a subagent to fix it. Now you have a subagent managing which code tool to use, and Claude still defaults to GitHub half the time because the subagent only fires when invoked, not on every session.

You needed one line in CLAUDE.md.

This is the central failure mode the slide names plainly: reaching for the wrong layer. Claude Code has five extensibility mechanisms - CLAUDE.md, MCP servers, Skills, Subagents, and Hooks - and each one solves a categorically different class of problem. They are not interchangeable. Using the wrong one does not just fail to solve the problem; it adds complexity, consumes context, and creates maintenance burden for a configuration that shouldn't exist.

The thesis of this article is simple and provable: layer selection is a diagnostic problem, not a capability problem. Before you build anything, you must correctly identify what type of problem you have. The five layers map to five problem types:

CLAUDE.md - static facts the agent must always know
MCP - live access to external systems the agent must pull from
Skills - procedural expertise the agent must apply on matching tasks
Subagents - context isolation for work that would pollute the main thread
Hooks - deterministic enforcement of rules that cannot be probabilistic

Miss the diagnosis, and no amount of configuration quality fixes the mismatch.

The Five Layers and What Each One Actually Does

Before the decision framework, the precise definitions - because most of the misuse patterns come from fuzzy understanding of what each layer controls.

CLAUDE.md is the always-on context layer. Every session, every task, Claude reads it from disk and holds it in the context window. This is where static project facts live: your tech stack, your architectural decisions, your repo conventions, who to contact for what, which tools are in play. It is not a place for workflows, not a place for access credentials, not a place for safety rules. It is a persistent briefing document that answers "what is this project and how does it work?"

MCP (Model Context Protocol) is the access layer. It connects Claude to external systems - databases, APIs, monitoring tools, project management platforms, code hosts. MCP gives Claude the ability to read live Sentry errors, query a PostgreSQL schema, create Linear tickets, or pull metrics from Databricks. It does not teach Claude how to use those systems intelligently for your team. It provides the connection; it does not provide the expertise.

Skills are the expertise layer. A skill is a SKILL.md file (plus optional scripts and references) that encodes how your team does something - code review standards, deployment procedures, PR conventions, incident investigation workflows. Claude loads a skill only when the current task matches the skill's description. Skills carry tacit procedural knowledge; they are the difference between "Claude can write code" and "Claude writes code the way your team writes code."

Subagents are the isolation layer. They spawn fresh context windows, execute focused work inside them, and return summaries to the parent agent. The parent never sees the intermediate steps - just the result. Subagents exist to prevent the parent's context from being polluted by exploratory or verbose work. They are not a general-purpose "make Claude smarter" mechanism; they are a specific solution to context accumulation.

Hooks are the enforcement layer. They are scripts that fire at fixed lifecycle points - before a tool runs, after it runs, at session start, at session stop - and can physically block operations. Hooks are how you turn a probabilistic prompt instruction ("never use rm -rf") into a deterministic policy. They run regardless of conversation state, model reasoning, or context length. For when a rule deserves a hook, a skill, or nothing at all, the Enforceability Test in Part 9 is the deeper cut of this diagnostic.

The Symptom-to-Layer Diagnostic Table

The slide's table is the fastest path to correct diagnosis. Before building anything, identify your symptom:

If the symptom is...	Reach for	Why
"Claude keeps using GitHub instead of Azure DevOps."	`CLAUDE.md`	Static facts belong at the top of the context, always loaded.
"Claude can't see live Sentry errors or Databricks tables."	MCP	Dynamic access to external systems, governed, pull on demand.
"Claude doesn't follow our house style for X."	Skill	Tacit procedural knowledge, loaded only when the task calls for it.
"Main context fills up before the task is done."	Subagent	Fan the heavy reading out to fresh windows; parent gets a summary.
"Claude shouldn't be able to do X, no matter what the prompt says."	Hook	Deterministic, not persuadable. Policy before tool call.

Memorize the "Why" column. It is the diagnostic principle, not just the answer. When you encounter a new problem, match it to the principle - not the symptom.

Five Scenarios, Five Diagnoses

The table gives you the framework. These scenarios build the pattern-matching instinct.

Scenario 1: "Claude always picks the wrong test runner"

Team: Python monorepo. They use pytest with specific config flags. Claude keeps running bare python -m unittest or omitting required flags.

Wrong reach: A hook that intercepts every Bash call to check for test runner invocations. This adds latency to every bash execution, requires pattern-matching logic for dozens of test invocation forms, and still doesn't tell Claude how to configure pytest for the project.

Wrong reach: An MCP server to "connect Claude to the test suite." MCP provides access, not knowledge. The problem is not that Claude cannot reach the test runner - it is that Claude does not know how your team runs it.

Correct diagnosis: Static project fact + procedural convention = CLAUDE.md entry + possibly a Skill.

code

<!-- CLAUDE.md -->## Test executionAlways run tests using:```bashpytest tests/ -v --cov=src --cov-report=term-missing -x

The -x flag stops on first failure. Do not use -k to subset tests unless explicitly asked. Never use python -m unittest.

If there is a more detailed testing workflow (when to run smoke tests vs full suite, how to interpret coverage thresholds, how to triage failures), that becomes a test-runner Skill that loads when the task involves testing. The convention goes in CLAUDE.md. The workflow goes in the Skill.

Scenario 2: "Claude can't see our production logs"

Team: Runs services on AWS. Wants Claude to help debug incidents by reading CloudWatch logs and recent deployment events from their CI/CD pipeline.

Wrong reach: A Skill that teaches Claude how to read logs. Skills provide procedural knowledge, not access. You cannot write a SKILL.md that gives Claude access to a live CloudWatch stream - it can only describe how to query one if Claude already has the tool to do so.

Wrong reach: A subagent that "does the log reading." A subagent without tool access is just an isolated context window with no ability to reach external systems. You need the access layer first.

Correct diagnosis: External system access = MCP.

code

// .mcp.json{  "mcpServers": {    "cloudwatch": {      "command": "npx",      "args": ["-y", "@aws-mcp/cloudwatch-logs"]    },    "github-actions": {      "command": "npx",      "args": ["-y", "@modelcontextprotocol/server-github"]    }  }}

Once the MCP server is connected, Claude can pull log streams, query error patterns, and read deployment histories. If the team has a specific incident investigation workflow (which signals to check first, how to correlate deployment times with error spikes, what format to use for the incident summary), that becomes a Skill that loads on top of the MCP access. MCP gives the connection; the Skill gives the expertise to use it correctly.

Scenario 3: "Claude writes PRs that don't follow our review standards"

Team: Has a thorough PR review checklist: specific security checks, coverage requirements, migration safety gates, documentation requirements for public APIs. Claude produces good code but the PR descriptions and review focus are inconsistent.

Wrong reach: CLAUDE.md with a long PR checklist. CLAUDE.md holds static facts. A 40-point PR checklist is procedural knowledge that should only load when Claude is working on a PR - not on every session consuming context for tasks that have nothing to do with code review.

Wrong reach: A subagent for PR review. The problem is not context pollution - it is inconsistent expertise. A subagent in isolation with no skill definition will behave just as inconsistently as the main agent without one.

Correct diagnosis: Tacit procedural knowledge, task-specific = Skill.

code

---name: pr-reviewdescription: >  Applies team PR review standards. Use when reviewing a pull request,  writing a PR description, or checking if a PR is ready for review.  Do NOT trigger for general code questions or implementation tasks.---# PR Review Skill## Mandatory checks (run in order, block if failing)### Security- Authentication: no bypass of auth middleware, no hardcoded credentials- SQL: parameterized queries only, no string interpolation in queries- Dependencies: no new packages without documented justification### Coverage- All new functions have corresponding test files- Coverage cannot decrease from the base branch baseline### Documentation- Every public function modified: docstring updated- API surface change: CHANGELOG.md entry required### Migration safety (if database models changed)- Migration file exists and has a corresponding rollback- No column drops without deprecation period documented## Output formatReturn: PASS / FAIL verdict, blocking issues (must fix), advisory items (should fix).

The Skill loads automatically when Claude is doing PR work. For all other tasks, it costs zero context. The expertise is consistent because it is encoded, not relying on prompt memory.

Scenario 4: "Claude runs out of context halfway through auditing a large codebase"

Team: Has Claude performing a security audit across a 200,000-line monorepo. It reads files, accumulates findings, and by the time it gets to the authentication module, the context is so polluted with earlier file contents that it misses obvious issues in the auth code.

Wrong reach: A Skill for "security audit." A Skill teaches Claude how to audit - it does not solve the problem that the audit of 200,000 lines creates 100,000+ tokens of accumulated file content in the main context.

Wrong reach: A Hook that tries to manage context. Hooks fire around tool calls; they cannot solve the structural problem that a single-context agent reading 500 files will accumulate 500 files' worth of tokens.

Correct diagnosis: Context accumulation from heavy reading = Subagent.

code

---# .claude/agents/security-auditor.mdname: security-auditordescription: >  Audits a specified module or directory for security vulnerabilities.  Use when a security audit would flood the main context with file contents.  Receives: a scope (directory or module name) and specific concerns to focus on.  Returns: structured findings summary only - not file contents.tools:  - Read  - Grep  - Globmodel: claude-haiku-4-5-20251001---# Security AuditorYou receive a scope and a set of security concerns to investigate.Read all files within scope. Return ONLY:## Findings- Critical: [list with file:line references]- High: [list]- Medium: [list]## Files reviewedCount only. Do not include file paths unless findings reference them.## Confidencehigh | medium | low + one sentence justification.Return no file contents. Return findings only.

The orchestrator delegates each module to a security-auditor subagent. Each subagent burns its own context reading files and returns ~400 tokens of findings. The orchestrator synthesizes clean findings across all modules without ever holding a single file's content itself.

If the team also has a specific audit workflow (which modules to prioritize, which vulnerability classes are most common in their stack, what the findings report format looks like for their security team), that becomes a Skill that the orchestrator loads. The Skill teaches the orchestrator how to conduct audits. The Subagent isolates the reading work.

Scenario 5: "Claude sometimes deletes the wrong environment's database"

Team: Running Claude autonomously for database migrations. Three environments: dev, staging, prod. Claude correctly identifies the right migration most of the time, but has twice run a migration against staging when the task was scoped to dev.

Wrong reach: CLAUDE.md with "never run migrations against staging or prod without explicit confirmation." This has already been tried. The model reads it. Under context pressure from a complex migration task with dozens of files in play, it sometimes misidentifies the environment scope.

Wrong reach: A Skill for "safe migrations." The Skill teaches Claude how to run migrations well. It does not prevent Claude from applying that knowledge to the wrong environment.

Wrong reach: A subagent that "validates the environment." A subagent that validates before running still relies on the model correctly identifying the environment - and if the model is getting that wrong, the validating subagent will too.

Correct diagnosis: "Claude shouldn't be able to do X, no matter what the prompt says" = Hook.

code

#!/bin/bash# .claude/guards/protect-prod-staging.sh# Blocks any database migration command targeting staging or prod environmentsset -euo pipefailINPUT=\$(cat)COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty')# Check if this is a migration-class commandIS_MIGRATION=falsefor pattern in "migrate" "alembic" "flyway" "liquibase" "db:migrate" "rails db"; do  if echo "$COMMAND" | grep -qi "$pattern"; then    IS_MIGRATION=true    break  fidone[ "\$IS_MIGRATION" = "false" ] && exit 0# Check if the command targets a protected environmentIS_PROTECTED_ENV=falsefor env in "staging" "prod" "production" "STAGING" "PROD" "PRODUCTION" "--env=prod" "ENV=prod"; do  if echo "$COMMAND" | grep -q "$env"; then    IS_PROTECTED_ENV=true    break  fidoneif [ "\$IS_PROTECTED_ENV" = "true" ]; then  jq -n '{    hookSpecificOutput: {      hookEventName: "PreToolUse",      permissionDecision: "deny",      permissionDecisionReason: "Migration targeting staging/prod requires       explicit human approval. Use the migration request workflow instead."    }  }'  exit 0fiexit 0

code

{  "hooks": {    "PreToolUse": [      {        "matcher": "Bash",        "hooks": [          { "command": "./.claude/guards/protect-prod-staging.sh" }        ]      }    ]  }}

The hook fires before every Bash execution. It does not care about Claude's reasoning, context length, or task framing. If the command matches the pattern, it is blocked. The model cannot reason its way around it.

The Layer Interaction Patterns

Real production setups stack layers to solve composite problems. Understanding the combinations is as important as understanding the layers individually.

MCP + Skill - The most common combination. MCP provides access; the Skill provides expertise. A Sentry MCP server gives Claude access to your error data. A sentry-triage Skill teaches Claude your team's specific triage process: which error classes are critical vs expected, how to correlate errors to recent deployments, what format to use for the incident Slack message. Without the Skill, Claude has access but no workflow. Without the MCP, the Skill has workflow but no data.

CLAUDE.md + Skill - Scope separation by specificity and frequency. CLAUDE.md holds always-applicable facts. Skills hold task-specific workflows. "We use TypeScript with strict mode enabled and prohibit any except in test files" → CLAUDE.md. "How to write a TypeScript migration for a new service including generated client types" → Skill.

Skill + Subagent - Expertise plus isolation. The orchestrator loads a Skill to understand how to conduct an audit. Subagents execute the file reading in isolation to prevent context accumulation. The Skill shapes the orchestrator's judgment. The Subagent protects the orchestrator's context.

Hook + any layer - Hooks enforce what other layers cannot guarantee. A Skill might teach Claude to run the formatter before committing. A Hook runs the formatter after every file edit, unconditionally. The Skill is the preferred behavior. The Hook is the guaranteed behavior. Layer them when the consequence of missing the behavior is meaningful.

The Decision Flow

Use this flow as a first-pass diagnostic before configuring anything:

mermaid

flowchart TD
    A[Symptom: Claude is doing X wrong\nor cannot do Y]:::blue --> B{Is X wrong because\nClaude lacks a static fact?}:::purple
    B -->|Yes| C[CLAUDE.md:\nAdd to always-on context]:::teal
    B -->|No| D{Does solving it require\nlive access to an external\nsystem or API?}:::purple
    D -->|Yes| E[MCP:\nConnect the external system]:::teal
    D -->|No| F{Is it inconsistent procedural\nbehavior on a specific task type?}:::purple
    F -->|Yes| G[Skill:\nEncode the workflow in SKILL.md]:::teal
    F -->|No| H{Does solving it require\nisolating heavy work from\nthe main context?}:::purple
    H -->|Yes| I[Subagent:\nDelegate with fresh context window]:::teal
    H -->|No| J{Must this never happen\nregardless of what the\nmodel reasons?}:::purple
    J -->|Yes| K[Hook:\nEnforce at the lifecycle level]:::red
    J -->|No| L[Prompt:\nThis is a one-off task,\nnot an infrastructure problem]:::grey

    C --> M{Also needs procedural\nworkflow for that task?}:::purple
    M -->|Yes| G
    E --> N{Also needs expertise on\nhow to use that access?}:::purple
    N -->|Yes| G
    G --> O{Also needs context isolation\nfor the reading/research work?}:::purple
    O -->|Yes| I
    I --> P{Also needs enforcement\nto prevent misuse?}:::purple
    P -->|Yes| K

    classDef blue fill:#4A90E2,color:#fff,stroke:#3A7BC8
    classDef purple fill:#7B68EE,color:#fff,stroke:#6858DE
    classDef teal fill:#98D8C8,color:#fff,stroke:#88C8B8
    classDef red fill:#E74C3C,color:#fff,stroke:#D43C2C
    classDef grey fill:#95A5A6,color:#fff,stroke:#859596

The bottom row of combinations shows that the correct answer is often two layers, not one. The diagnostic question is: which layer is the primary solution, and which layer is additive on top of it?

The Wrong-Layer Tax

Every wrong-layer decision has a cost. Naming it explicitly helps teams justify the diagnostic investment.

Skill put in CLAUDE.md - The workflow detail loads every session, regardless of task. A PR review checklist consuming 2,000 tokens of context on a session where you are only debugging a bash script is 2,000 tokens of noise degrading model attention.

CLAUDE.md fact put in a Skill - The fact only loads when the Skill triggers. If the Skill triggers inconsistently (which skills do, based on description matching), the fact is sometimes absent when Claude needs it. A repo convention that should always apply becomes something Claude knows on some tasks but not others.

MCP access without a Skill - Claude can query your Sentry instance. It will do so generically, without your team's triage conventions, output formats, or prioritization rules. You have the connection but not the expertise.

Skill where you needed a Subagent - The Skill loads the audit workflow into the main context. The file reading for the audit also runs in the main context. The context fills. The Skill's own instructions compete with 80,000 tokens of file content for attention. Performance degrades.

Subagent where you needed a Skill - You build a subagent that handles code review. The subagent applies its own ad-hoc review criteria (not your team's standards) in isolation. You have context isolation but no consistent expertise. The subagent should have loaded a Skill.

Prompt instruction where you needed a Hook - "Never delete production data." Claude follows it almost always. The one time it does not is the incident. Probabilistic safety on irreversible operations is a production liability.

The common failure mode: reaching for subagents when you needed a skill; reaching for a skill when you needed an MCP server. The wrong layer is not just the wrong tool - it is a configuration that adds cost and complexity while failing to address the actual problem.

The Layer-to-Problem Mapping: Reference Summary

Layer	Solves	Does not solve	Loaded when
`CLAUDE.md`	Static project facts, always-applicable conventions	Procedural workflows, external access, enforcement	Every session, always
MCP	Live access to external systems and APIs	How to use that access expertly, enforcement	When Claude calls the MCP tool
Skill	Procedural expertise for specific task types	External access, context isolation, enforcement	When task matches description
Subagent	Context isolation for heavy or verbose work	Expertise, external access, enforcement	When orchestrator delegates
Hook	Deterministic enforcement of non-negotiable policy	Expertise, external access, context isolation	At lifecycle event, always

The diagnostic question for any new problem: which cell in this table does my problem live in?

Build the layer that solves the problem. Don't build the layer that sounds like it should solve the problem.

The Layer Build Order for New Projects

The official Claude Code documentation recommends this sequence, and it maps to the diagnostic framework precisely:

Start with CLAUDE.md. Add your tech stack, repository conventions, architectural decisions. Test that Claude applies them consistently before adding anything else. This is cheap to write and has immediate, always-on impact.

Add Skills when you see repeated inconsistency. Every time Claude does something in a way you have to manually correct, that is a Skill waiting to be written. Not a one-off - a pattern. A pattern means the correction needs to persist.

Add MCP when Claude needs external access. A specific, named external system that Claude needs to read from or write to. Not "Claude should be smarter about external data generally" - a specific integration with a specific system.

Add Subagents when context fills before the task is done. The diagnostic signal is clear: the agent degrades mid-task because the context is saturated. That is when you isolate the heavy reading into fresh windows.

Add Hooks for policy that cannot be probabilistic. Any rule where the consequence of violation is irreversible or regulatory. Tests must pass. Secrets must not be committed. Production environments must not be modified without approval.

This sequence is not arbitrary. Each layer has lower complexity than the one after it. Most problems encountered in the first few months of using Claude Code are solved by the first two. Add the others only when a specific gap appears that the previous layers cannot address.

Production Checklist: Layer Selection Before You Build

Before adding any configuration to your Claude Code setup:

Diagnosis

Can you state the symptom in one concrete sentence (like the examples in the slide table)?
Have you identified which of the five problem types your symptom maps to?
Have you ruled out the adjacent wrong layers explicitly?

CLAUDE.md additions

Is this a static fact that should apply to every task in every session? (If yes - CLAUDE.md. If it only matters for one task type - Skill.)
Is it specific enough to not conflict with or contradict existing entries?
Have you checked that it is not a workflow that should be a Skill instead? (Remember: every token in CLAUDE.md is paid on every session - a 2,000-token workflow checklist costs 2,000 tokens even when Claude is writing a bash script. That is the Wrong-Layer Tax.)

MCP additions

Is there a specific named external system Claude needs access to?
Does a production-ready MCP server exist for it, or do you need to build one?
Have you considered adding a Skill on top for expertise in using that access?

Skill additions

Is there a task type where Claude's behavior is inconsistently wrong?
Is the Skill description precise enough to trigger on the right tasks and not on others?
Have you run evals with and without the Skill to verify it actually helps?

Subagent additions

Is the primary symptom context saturation (not inconsistency or lack of access)?
Is the work you are delegating genuinely bounded and self-contained?
Does the subagent definition include an explicit output contract?

Hook additions

Is this a rule that must hold even when the model believes violating it is correct?
Does the enforcement script use permissionDecision: "deny" or exit 2 (not exit 1)?
Is the hook fast enough (<500ms for PreToolUse) to not add perceptible latency?

References

Anthropic / Claude Code Docs. (2026). Extend Claude Code - Features overview. https://code.claude.com/docs/en/features-overview
Anthropic. (2025-2026). Skills explained: How Skills compares to prompts, Projects, MCP, and subagents. https://claude.com/blog/skills-explained
Anthropic / Claude Code Docs. (2026). Hooks reference. https://code.claude.com/docs/en/hooks
Anthropic / Claude Code Docs. (2026). Create custom subagents. https://code.claude.com/docs/en/sub-agents
Anthropic / Claude Code Docs. (2026). Agent Skills overview. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
Anthropic. (October 2025). Equipping agents for the real world with Agent Skills. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
Anthropic. (September 2025). Effective context engineering for AI agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
AlexOp. (April 2026). Understanding Claude Code's Full Stack: MCP, Skills, Subagents, and Hooks Explained. https://alexop.dev/posts/understanding-claude-code-full-stack/
Skiln. (2026). Claude Code Plugins vs Skills vs MCP Servers: A Developer's Decision Guide. https://skiln.co/blog/claude-code-plugins-vs-skills-vs-mcp-decision-guide
Verdent. (2026). Claude Skills vs MCP vs Agents: Key Differences. https://www.verdent.ai/guides/claude-skills-vs-mcp-agents-comparison
Penligent. (April 2026). Inside Claude Code: The Architecture Behind Tools, Memory, Hooks, and MCP. https://www.penligent.ai/hackinglabs/inside-claude-code-the-architecture-behind-tools-memory-hooks-and-mcp/
GUVI. (2026). Claude Code Skills Comparison: Choose the Right Tool. https://www.guvi.in/blog/claude-code-skills-comparison/
Trensee. (March 2026). Claude Code Advanced Patterns: How to Connect Skills, Fork, and Subagents. https://www.trensee.com/en/blog/explainer-claude-code-skills-fork-subagents-2026-03-31
Model Context Protocol. MCP Specification and Architecture. https://modelcontextprotocol.io/docs/getting-started/intro

Agentic AI

Follow for more technical deep dives on AI/ML systems, production engineering, and building real-world applications:

The Five Layers and What Each One Actually Does

The Symptom-to-Layer Diagnostic Table

Five Scenarios, Five Diagnoses

Scenario 1: "Claude always picks the wrong test runner"

Scenario 2: "Claude can't see our production logs"

Scenario 3: "Claude writes PRs that don't follow our review standards"

Scenario 4: "Claude runs out of context halfway through auditing a large codebase"

Scenario 5: "Claude sometimes deletes the wrong environment's database"

The Layer Interaction Patterns

The Decision Flow

The Wrong-Layer Tax

The Layer-to-Problem Mapping: Reference Summary

The Layer Build Order for New Projects

Production Checklist: Layer Selection Before You Build

References

Related Articles

Comments