Guide · March 2026

From Unknown Codebase to Architecture Document: A Complete Practitioner's Guide

A 3-pass methodology for compressing any codebase - in any language, any architecture style - into validated diagrams, debt scores, and decisions that engineering teams and stakeholders can actually act on

Works with: Any language · Any framework · Any architecture style
Stack-specific appendices: Java/Spring/JSP · Python/Django/FastAPI · Node.js/Express · .NET/C# · Ruby on Rails · Go


How to Use This Guide

This guide has two layers:

How to Use This Guide

Figure: How to Use This Guide

Start here. Complete the core guide with your codebase. When a step says "identify entry points" or "run dependency analysis," jump to your stack appendix for the exact files, commands, and patterns to look for.

Not every reader needs to start at page 1. Jump directly to what matters for your situation:

Your situationGo directly to
"I have 2 hours and need something now"Time-Box Strategy → Minimum Viable Doc
"I'm presenting to a CTO / VP next week"Audience Adaptation → Output Template
"I need to audit an unfamiliar codebase"Pass 1 → Anti-Pattern Checklist
"I have a doc but need to validate it"Architecture Validation Loop
"I need to track architecture changes over time"Architecture Evolution & Decision Records
"My team disagrees on the architecture"Team & Org Context
"I want to score debt and prioritize fixes"Debt Risk × Effort Matrix
"My system is microservices or event-driven"If Your System Is Microservices or Event-Driven
"My system is serverless"If Your System Is Serverless
"I'm using AI to help with analysis"AI-Assisted Workflow → Prompt Library

Core Mental Model

Code = Details
Architecture = Compression of intent

Rule: If it cannot be drawn, it is not architecture.
Every architectural claim must be expressible as a diagram - a component box, an arrow, a sequence, or a boundary. If you can only describe it in prose, keep digging until you find the shape.

Think in 3 passes:

PassFocusOutput
1Structure discoveryModules, services, entry points
2Behavior understandingData flow, request traces, state
3AbstractionClean architecture doc

⏱️ Time-Box Strategy

Match depth of analysis to time available. Start small - the 2-hour version forces you to find the 20% that explains 80% of behavior.

Time availableStrategy
2 hoursEntry points only + trace 1 core flow. Produce: layer map + 1 sequence diagram.
1 dayFull Pass 1 + 2 key flows. Produce: component diagram + 2 sequence diagrams + debt list.
3–5 daysAll 3 passes + complete output doc. Produce: full architecture document (all 12 sections).
1–2 weeksAbove + runtime validation (profiling, log analysis, developer interviews).

Pass 1 - Structural Mapping (What exists?)

Objective: Build a complete map of the system before understanding behavior. Entry points and module boundaries are defined differently per stack - see your appendix for specifics.

Step 1 - Identify Entry Points

Every system has entry points: the places where external input arrives. Find them all before going deeper.

Entry point typeWhat to look for (generic)Stack appendix has specifics
HTTP entryRoute definitions, URL mappings, controller registrations
Config / DI rootDependency injection setup, service wiring, IoC container config
Background jobsCron expressions, queue consumers, event listeners, schedulers
CLI entryMain functions, command definitions, argument parsers
Event triggersMessage broker subscriptions, webhook receivers, pub-sub listeners

Entry points define system boundaries. If you haven't found all of them, your component diagram will have invisible holes.

Step 2 - Extract Module Structure

Every codebase has a layering strategy - even if it isn't enforced. Look for these universal layers, regardless of what they're named in your stack:

System Layer Diagram

Figure: System Layer Diagram

Map your codebase's actual folder/module structure onto these layers. Do not assume they're clean - verify it.

Step 3 - Identify Dependencies

Three things to extract, regardless of stack:

1. Build file / package manifest
Every stack has one. This is your fastest source of: language version, framework choice, and all third-party dependencies.

StackBuild file
Javapom.xml, build.gradle
Pythonrequirements.txt, pyproject.toml, Pipfile
Node.jspackage.json, yarn.lock
.NET*.csproj, packages.config, NuGet.config
RubyGemfile, Gemfile.lock
Gogo.mod, go.sum

2. Dependency graph between modules
Use static analysis to find which module imports which. Look for violations: does your data layer import from the HTTP layer? Does a domain model import an HTTP client? See appendix for tools.

3. Circular dependencies
Circular imports between modules are a strong signal of missing abstraction. Every stack has a tool to detect them - see appendix.


Pass 2 - Behavioral Mapping (How it works?)

Objective: Stop thinking about files. Think about what happens when a user does something. Trace flows through the full stack.

Step 1 - Flow Prioritization Strategy

Not all flows are equally important. Trace in this order:

PriorityFlow typeRationale
1 - Revenue-generatingOrder placement, payment, account creationBusiness stops if these break. Highest scrutiny.
2 - Highest-frequencySearch, list views, dashboard loadPerformance bottlenecks live here.
3 - Most complexMulti-step workflows, approval chains, batchHidden state and race conditions live here.
4 - Most failure-proneExternal integrations, file uploads, scheduled jobsError paths are usually underdocumented.
5 - AuthenticationLogin, session/token management, logoutDefines the trust model for everything else.

Stop when you can answer: "What does this system actually do, and where does it break?"

Step 2 - Trace Key Request Flows

For each selected flow, trace the full path from entry to response:

Request Flow Diagram

Figure: Request Flow Diagram

For each hop, capture:

  • What data enters and exits
  • Where state changes (DB write, cache update, session mutation)
  • Where the transaction boundary is (if applicable)
  • What happens on failure at this hop

Step 3 - Map State, Async, and Concurrency

These three are the most underdocumented aspects of any system:

State management
Where does the system store state between requests? Options: session, JWT, database, cache, in-memory. List every state store and what it holds.

Async boundaries
Where does execution leave the request thread? Message queues, async/await, background workers, webhooks. Each async boundary is an invisible failure point if not documented.

Concurrency
Where can two requests race? Shared in-memory state, non-atomic DB operations, cache-then-write patterns. Document these explicitly - they're where production incidents come from.


Pass 3 - Abstraction (Think Like an Architect)

Objective: Compress what you've learned into diagrams and principles. Stop thinking like a developer.

Step 1 - Apply Architecture Compression Rules

This is the step most engineers skip. Without it, two engineers using this guide produce completely different outputs.

Rule 1: Collapse classes/functions into capabilities

❌ Too low-level:              ✅ Compressed:UserService                    User ManagementUserValidator             →    (registration, auth,UserEmailService               profile, notifications)UserProfileService

Rule 2: Collapse endpoints into use cases

❌ Too low-level:              ✅ Compressed:POST /api/orders               Place OrderGET  /api/orders/{id}     →    Track OrderPUT  /api/orders/{id}          Modify OrderDELETE /api/orders/{id}        Cancel Order

Rule 3: Collapse tables/collections into domain concepts

❌ Too low-level:              ✅ Compressed:orders                         Order (aggregate)order_lines               →    ├── line itemsorder_status_history           ├── status historyorder_payment_refs             └── payment ref

Rule 4: Collapse integrations into roles

❌ Product-named:              ✅ Role-named:Stripe                         Payment GatewayTwilio                    →    Notification ServiceElastic                        Search Engine

Rule 5: Express layer interactions in one sentence per boundary

"The handler layer delegates all business decisions to the service layer. The service layer owns consistency boundaries and orchestrates repositories. Repositories are the only components that touch the data store."

If you cannot write this cleanly, the layers are not clean. Document the violation.

Step 2 - Identify the Architecture Style

StyleUniversal signalsImplication
Layered monolithSingle deployable, shared DB, clear package layersSimple ops, hard to scale parts independently
Modular monolithMultiple internal modules with defined boundaries, single deployBetter separation, still one deployment unit
MicroservicesMultiple deployables, each with own DB, API-to-API communicationIndependent scaling, complex distributed failure modes
Event-drivenMessage broker central, components subscribe to eventsLoose coupling, hard to trace flows end-to-end
ServerlessFunctions as entry points, no persistent processLow ops overhead, cold start and state limitations
TransitionalMix of the above - some parts modernised, others legacyDocument which style governs which part explicitly

If your system is Microservices or Event-Driven: The 3-pass methodology still applies, but the artifacts and tools differ. See the callout below before continuing.


If Your System Is Microservices or Event-Driven

The component diagram, sequence diagram, and anti-pattern checklist all remain valid - but three additional documentation concerns apply that don't exist in monolithic systems.

1. Service Map (replaces the component diagram)

Instead of one component diagram, produce a service map: one box per deployable service, arrows showing synchronous calls (HTTP/gRPC) and asynchronous events (message broker topics). Each box must show:

┌──────────────────────────────┐│  Order Service               ││  Owner: Platform team        ││  Language: Java 17           ││  DB: PostgreSQL (orders_db)  ││  Exposes: REST /orders/*     ││  Consumes: payment.completed ││  Emits: order.placed         │└──────────────────────────────┘

Key additions vs monolith component diagram:

  • Each service owns its own DB - document which DB per service
  • Every event topic must be named and documented (publisher + all subscribers)
  • Synchronous vs asynchronous calls must be visually distinct (solid vs dashed arrows)

2. Inter-Service Contract Documentation

In a monolith, interfaces are enforced by the compiler. In microservices, they are enforced by nothing - they drift silently. Document every service boundary:

Contract typeToolWhat to capture
REST APIsOpenAPI / SwaggerEndpoints, request/response schema, error codes
Async eventsAsyncAPITopic names, message schema, producer, consumers
gRPC.proto filesService definition, message types, versioning

Minimum requirement: every service must have an OpenAPI or AsyncAPI spec checked into its repository. If it doesn't exist, creating it is the first Pass 1 output for that service.

3. Distributed Tracing Integration

Sequence diagrams that cross service boundaries cannot be verified by reading code alone. They must be validated using distributed tracing.

# If OpenTelemetry is instrumented - query traces for the flow you documented# Jaeger UI: http://localhost:16686# Filter by: operation name matching your entry point# Verify: does the trace span tree match your sequence diagram?# If no tracing is instrumented - this is the highest-priority debt item# for any microservices system. Add it before documenting other flows.

Distributed tracing is Gate 3 (runtime validation) for microservices. Without it, sequence diagrams for cross-service flows are educated guesses.

Additional anti-patterns specific to microservices/event-driven:

DISTRIBUTED SYSTEMS[ ] Synchronous chain of 3+ services (distributed monolith - single point of failure)[ ] No circuit breaker between services (cascade failure risk)[ ] Shared database between services (defeats service isolation)[ ] No distributed tracing instrumented (flows are unverifiable)[ ] Event topics with no schema registry (consumer drift)[ ] No dead-letter queue for failed event processing[ ] Services communicating without contract/spec (implicit coupling)[ ] No idempotency on event consumers (duplicate processing risk)

Add these to your anti-pattern checklist score if your system is microservices or event-driven.


If Your System Is Serverless

Trigger: Functions as entry points, no persistent process (AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers, Vercel Edge Functions).

The 3-pass methodology applies, but serverless introduces four documentation concerns unique to this style.

1. Function Inventory (replaces the component diagram)

In serverless, the unit of deployment is a function, not a service. Your component diagram becomes a function map:

┌──────────────────────────────────────┐│  ProcessOrder (Lambda)               ││  Trigger: API Gateway POST /orders   ││  Runtime: Node.js 20, 512MB, 30s     ││  Reads: orders_table (DynamoDB)      ││  Writes: order.placed (EventBridge)  ││  Cold start: ~800ms (p95)            ││  Warm invocations: ~45ms (p95)       │└──────────────────────────────────────┘

Key fields every function entry must include:

  • Trigger - what invokes it (API Gateway, EventBridge rule, SQS, S3 event, cron)
  • Runtime + memory + timeout - these are architectural decisions, not config details
  • Cold start p95 - the hidden latency that breaks SLAs at low traffic
  • State it reads/writes - functions are stateless; state lives elsewhere, document where

2. State Inventory (the critical artifact)

Serverless functions are stateless by design. All state lives outside the function. Document every state store explicitly - this is the architectural skeleton of a serverless system:

State typeStore usedWhat it holdsAccess pattern
Persistent dataDynamoDB / RDS / S3Orders, customers, productsRead-heavy, keyed by ID
Session / authJWT (stateless) or ElastiCacheUser claimsPer-request verification
Distributed cacheElastiCache / RedisProduct catalog, configTTL-based, warm reads
Async queueSQS / EventBridgeEvents between functionsAt-least-once delivery
File / blobS3Uploads, reports, exportsEvent-triggered processing

If a function reads or writes to something not on this list, the state inventory is incomplete.

3. Cold Start Documentation

Cold start latency is the most common serverless architectural gap - it's invisible during development and catastrophic at low-traffic production. Document it explicitly:

Cold start profile - capture per function:  Trigger type:          API Gateway (synchronous - user waits)  Measured cold start:   ~800ms p95 (Java runtime, 512MB)  SLA requirement:       < 500ms p95  Gap:                   300ms over SLA  Mitigation options:    → Provisioned concurrency (cost: ~$0.015/hr per instance)    → Runtime switch to Node.js or Python (reduces to ~150ms p95)    → Architectural: move to container service for this function  Decision:              [pending - ADR-007]

Functions on synchronous user-facing paths (API Gateway, ALB) must have cold start latency explicitly accepted or mitigated. Functions on async paths (SQS, EventBridge) can tolerate cold starts - document which category each function falls into.

4. Execution Boundary Tracing

Serverless flows are harder to trace than monolith flows because execution is distributed across functions with no shared call stack. Every flow must be documented as a chain of trigger → function → output:

Place Order flow (serverless):1. User → API Gateway POST /orders2. → ProcessOrder (Lambda, sync)       → validates input       → writes ORDER record (DynamoDB)       → emits order.placed (EventBridge)       → returns 202 Accepted   [user response ends here]3. order.placed event → EventBridge rule → ReserveInventory (Lambda, async)       → reads PRODUCT record (DynamoDB)       → updates STOCK_LEVEL (DynamoDB, conditional write)       → emits inventory.reserved OR inventory.insufficient4a. inventory.reserved → ChargePayment (Lambda, async)       → calls PaymentGateway (external, 30s timeout)       → writes PAYMENT record (DynamoDB)       → emits payment.completed OR payment.failed4b. inventory.insufficient → NotifyUser (Lambda, async)       → sends email via SES       → writes ORDER status = CANCELLED (DynamoDB)

Every async hop is a potential failure point. Document the dead-letter queue and retry behavior at each async step.

Additional anti-patterns specific to serverless:

SERVERLESS[ ] Synchronous function chain > 3 hops (compounds cold start latency)[ ] Function timeout set to maximum (masks slow dependencies)[ ] Shared mutable state between function invocations (race condition)[ ] No cold start latency documented for user-facing functions[ ] Missing dead-letter queue on async triggers (silent failure)[ ] Database connection pool not managed per invocation (connection exhaustion)[ ] Secrets hardcoded in environment variables (should use Secrets Manager)[ ] No distributed tracing (X-Ray / OpenTelemetry) - flows unverifiable[ ] Function doing > 1 business responsibility (violates single-purpose principle)

Add these to your anti-pattern checklist score if your system is serverless.


Step 3 - Identify Cross-Language Design Patterns

These patterns appear in every stack. Learn to recognise them regardless of language:

PatternWhat it looks like (language-agnostic)
RepositoryA dedicated class/module for all data access - the only place that touches the DB
Service/Use CaseOrchestrates multiple repositories and domain objects for one business operation
Adapter/ClientWraps an external API, translates its interface to your domain's language
Middleware/FilterIntercepts every request - used for auth, logging, rate limiting
FactoryCentralises object creation, hides construction complexity
Observer/Event busDecouples components by emitting events rather than calling directly
CQRSSeparates read and write models - two different paths through the stack
SagaManages multi-step distributed transactions through compensating actions

Step 4 - Document Actual vs Intended Architecture

The most valuable output is the gap analysis: what the architecture was designed to be vs what it has become.

AreaUniversal check
Layer violationsDoes any layer import from a layer it shouldn't depend on?
Fat handlersDo HTTP handlers contain business logic beyond routing and input parsing?
Logic in viewsDo templates/views make data decisions rather than just rendering?
God modulesDoes one module own too many unrelated responsibilities?
Bypassed consistencyAre writes happening outside the documented consistency boundary?
Undocumented asyncAre there background operations not represented in the architecture?

Anti-Pattern Detection Framework

Run this language-agnostic checklist against any codebase. Score your findings.

LAYER VIOLATIONS[ ] Handler/controller directly calls data access layer (bypasses service)[ ] Domain/model layer imports from HTTP or UI layer[ ] Data access layer contains business logic beyond querying[ ] Service layer imports from handler/controller layerHANDLER / VIEW LAYER[ ] Handler method exceeds ~50 lines of logic[ ] Business rules computed inside a template or view[ ] State mutated directly in the handler without going through a service[ ] Auth checks scattered across handlers (not centralised in middleware)CONSISTENCY / STATE[ ] Multi-step write operation with no rollback/compensation if step N fails[ ] In-memory state shared across requests without synchronisation[ ] Cache written before DB (or vice versa) without atomic update[ ] Session/token stores domain objects that should live in the DBSERVICE / ORCHESTRATION LAYER[ ] God service: one module/class handling > 3 distinct business domains[ ] Business logic duplicated across multiple services[ ] Service calls another service directly, creating hidden coupling[ ] No clear owner for a cross-domain operation (logic scattered)DATA ACCESS[ ] N+1 queries (queries inside loops)[ ] Raw query strings concatenated with user input (injection risk)[ ] No query abstraction - SQL/query language used directly across many layers[ ] Missing index on frequently filtered or joined columnINTEGRATIONS[ ] External API called directly from service with no adapter/client wrapper[ ] No timeout configured on outbound HTTP calls[ ] No retry or circuit breaker on failure-prone external calls[ ] External API error codes leaked as-is into internal domain errors

Scoring Guide

ScoreHealth
0–3 violationsHealthy - normal technical debt
4–8 violationsModerate debt - prioritize top 3 for refactoring
9–15 violationsHigh debt - architecture needs active remediation
16+ violationsCritical - modernization planning required before feature work

Common Failure Modes

Use these as a self-correction checklist before sharing any architecture output.

❌ Failure 1 - Too Granular (Function/Class-Level Diagram)

Signal: More than 12 components in your diagram.
Fix: Apply Compression Rule 1. Merge anything that serves the same capability. A good architecture diagram has 5–12 components.

❌ 42 components (basically a class diagram)✅ 7 capabilities: User Mgmt · Order Mgmt · Payment · Inventory · Notification · Reporting · Admin

❌ Failure 2 - Too Abstract (No Actionable Structure)

Signal: Every component description uses only business words with no pointer to real code.
Fix: Every component must map to at least one real module, class, or file in the codebase.

❌ "The system manages the customer journey end-to-end"✅ OrderService (src/services/order.py) → handles place, modify, cancel operations     depends on: InventoryRepository, PaymentClient, NotificationService

❌ Failure 3 - Wrong Boundaries (Domain Leakage)

Signal: A component's responsibility statement uses "and" more than once across unrelated domains.
Fix: Split along noun/domain boundaries. Payment belongs in a Payment component even if OrderService currently calls it.

❌ OrderManagement: orders AND payments AND invoices AND customer lookup✅ OrderManagement: order lifecycle only   PaymentProcessing: charge, refund, gateway   CustomerManagement: lookup, profile   Invoicing: document generation

❌ Failure 4 - Missing Flows (Static Architecture Only)

Signal: The doc has component diagrams but no sequence diagrams.
Fix: Add at least one end-to-end sequence diagram for the primary revenue-generating flow.

❌ Doc contains: component diagram + layer table + data model   Missing: any sequence diagram showing what actually happens at runtime✅ Add: Key Flow - Place Order   User → POST /orders → AuthMiddleware → OrderHandler        → OrderService (consistency boundary begins)             → InventoryRepo.reserve()             → PaymentClient.charge()     [external, outside boundary]             → OrderRepo.save()        → (consistency boundary commits)        → response 201

❌ Failure 5 - Aspirational Architecture (Documents Intent, Not Reality)

Signal: The doc was written from design docs or README, not from reading the actual code.
Fix: Run the anti-pattern checklist. Mark anything unverified as [unverified - needs validation]. Get a developer who works in the codebase daily to review it.

❌ "Clean layered architecture, no business logic in handlers"   (written from the design doc, not the code)✅ "Intended: layered. Actual: 14 handler methods contain business logic.   See debt list section 11 for full inventory."

Quick Self-Correction Checklist

[ ] Component count is 5–12[ ] Every component maps to a named capability (not a function/class name)[ ] No component description uses "and" across unrelated domains[ ] At least 1 end-to-end sequence diagram exists[ ] Every external dependency has a role name (not just a product name)[ ] Document validated against actual code (not just design docs or README)[ ] A developer has confirmed the component boundaries[ ] Architectural debt is documented, not hidden

Golden Output Example

This is the calibration target. Your output should be this compressed, this diagram-driven, and this decision-ready. The example uses an e-commerce order system - adapt the structure to your domain.


System: E-Commerce Order Processing Platform
Language/framework: (fill in your stack)
Architecture style: Layered Monolith


Component Overview

┌──────────────────────────────────────────────┐│            Entry / Handler Layer              ││  OrderHandler  PaymentHandler  ReportHandler  │└──────────────┬───────────────────────────────┘               │ delegates all business logic┌──────────────▼───────────────────────────────┐│         Orchestration Layer                   ││  OrderManagement  PaymentProcessing           ││  InventoryControl  NotificationService        │└────────┬──────────────────────┬──────────────┘         │ reads/writes          │ calls┌────────▼──────────┐  ┌────────▼──────────────┐│  Data Access       │  │  Integration Layer     ││  OrderRepo         │  │  PaymentGateway        ││  InventoryRepo     │  │  ERPAdapter            ││  CustomerRepo      │  │  EmailNotifier         │└────────┬──────────┘  └───────────────────────┘┌────────▼──────────┐│  Data Store        ││  (primary DB)      │└───────────────────┘

Key Flow: Place Order

User → POST /orders → AuthMiddleware → OrderHandler     → OrderManagement.placeOrder()          [consistency boundary begins]          → InventoryRepo.reserve()          → PaymentGateway.charge()        [external - outside boundary]          → OrderRepo.save()          → NotificationService.send()     [async, fire-and-forget]          [consistency boundary commits]     → 201 response

Domain Concepts

ConceptStorageAggregate root
Orderorders, order_lines, order_statusOrder
Customercustomers, addressesCustomer
Inventoryproducts, stock_levels, reservationsProduct
Paymentpayment_records, payment_attemptsPayment

Architecture Health

CheckStatus
Layer violations2 found - document in debt list
Logic in handlers6 handlers contain business logic
God modulesNone - services well-scoped
Async boundaries documentedNotification only
Anti-pattern score5/35 - Moderate debt

Modernization Priorities

  1. Move business logic out of 6 handlers → service layer (low risk)
  2. Extract PaymentProcessing into standalone module (enables independent deploy)
  3. Add circuit breaker to PaymentGateway client (reduces incident blast radius)

Architecture Validation Loop

Architecture is only trustworthy when validated. The loop has 5 steps - do not skip Step 5.

Pass 1 → Pass 2 → Pass 3         [Draft doc]    Gate 1: Static validation    Gate 2: Developer validation    Gate 3: Runtime validation    Gate 4: Failure validation    Step 5: Refinement (update diagrams + debt list)         [Published doc]    Cadence: re-validate quarterly or after major release              ↑___________________________________|

Gate 1 - Static Validation

Goal: Confirm your component diagram reflects what the code actually imports, not what you assumed.

  • Run your stack's dependency analysis tool (see appendix)
  • Check: does every arrow in your diagram correspond to a real import?
  • Check: are there imports in the code that have no arrow in your diagram?
  • Run the anti-pattern checklist - score it

Failure signal: Any module importing from a layer it shouldn't depends on.

Gate 2 - Developer Validation

Goal: Confirm component names and boundaries match how the team thinks about the system.

Who: The engineer with the most commits in the past 12 months - not the original designer.

Questions to ask:

1. "Does this component diagram match how you'd explain the system to a new hire?"2. "Are there components I've merged that you'd keep separate?"3. "Is there behavior that doesn't appear in these diagrams?"4. "Does this consistency boundary diagram match what actually commits and rolls back?"5. "Which parts would you argue with?"

Failure signal: Developer disputes more than 2 component boundaries - compression decisions were wrong. Redo with their input.

Gate 3 - Runtime Validation

Goal: Confirm your documented flows match actual production behavior.

  • Extract most-called endpoints from access logs
  • Find most-executed code paths from APM or tracing
  • Compare slow operations against documented flows
  • Check: do the most-called paths match what you documented?
  • Check: are there high-frequency paths not in your diagrams?
MetricWhat it validates
Endpoint call frequencyThat your priority flow ranking is correct
Slowest operationsThat complex flows are correctly identified
DB query count per requestThat N+1 patterns are in your debt list
Async queue depthThat your async boundaries can handle load
Error rate per componentThat failure-prone flows are correctly flagged

Failure signal: High-frequency production path not in your sequence diagrams → missing component or flow.

Gate 4 - Failure Validation

Goal: Confirm your architecture doc can explain what breaks and what survives under failure.

Walk each scenario against your doc. The doc should be able to answer "what breaks?" and "what's the fallback?"

Scenario 1: Primary database unavailable→ Which components fail immediately?→ Which can degrade gracefully?→ Is there a circuit breaker? Does the doc show it?Scenario 2: External payment/API service returns 503→ Does the sequence diagram show the error path?→ Does the consistency boundary roll back?→ What is the user-visible impact?Scenario 3: Message broker / queue unavailable→ Which async flows stop?→ Are async dependencies documented in the component diagram?→ What is the fallback for each?Scenario 4: Deployment / process restart mid-request→ What state is lost?→ Is the state inventory in the doc accurate?→ What operations are not idempotent?

Failure signal: A known production incident from the past 12 months cannot be explained by the doc → the doc is wrong. Find the incident, trace what actually happened, update the doc.

Step 5 - Refinement (Closing the Loop)

Validation without refinement is just criticism. After each gate, findings must flow back into the document.

GateIf gaps found, update these
Gate 1 - StaticComponent diagram, layer violation inventory
Gate 2 - DeveloperComponent names, boundaries, anything disputed
Gate 3 - RuntimeSequence diagrams, priority flow ranking, NFR section
Gate 4 - FailureError paths in sequence diagrams, fallback behavior, deployment notes

Refinement rules:

✅ Every gap found → update diagram OR add to debt list (never silently ignore)✅ Every disputed boundary → redrawn with developer input✅ Every unverified claim → marked [unverified] until confirmed✅ Every production incident that the doc can explain → referenced as validation evidence❌ Never mark validation complete if Gate 2 found disputes❌ Never leave a known gap undocumented

Validation Cadence

WhenWhat
Before publishingGates 1 + 2 - mandatory
Within 2 weeks of publishingGate 3
QuarterlyGate 4 + developer re-review + Step 5
After major releasesAll 4 gates + Step 5

An architecture doc not validated in 6 months should be treated as a hypothesis, not a fact.


AI-Assisted Workflow

Universal Chunking Strategy

  1. Start with the build/package file - establishes framework context for all subsequent prompts
  2. Chunk by module/package, not by file - intra-module relationships are where design decisions live
  3. Send handlers with their backing services - a handler without its service loses half the context
  4. Send integration clients with the services that call them - timeout and error handling decisions live in this pairing

What NOT to Do

❌ Don't send single files in isolation   → Loses inter-module context. Always chunk by module.❌ Don't trust LLM summaries blindly   → LLMs hallucinate responsibility. Cross-check against actual imports     before writing anything into the architecture doc.❌ Don't generate architecture diagrams without validation   → LLM-generated architecture has ~30% error rate on component     relationships. Always run Gate 2 (developer review) after.❌ Don't accept the first component decomposition   → Apply compression rules yourself. LLMs stay too close to     file/class names. Your job is to compress further.❌ Don't skip the build file   → Framework choice, version, and dependency graph are all in there.     It's the fastest architectural signal in the codebase.

Universal Prompts by Artifact Type

For any service/use-case module

Analyze this module. Describe:1. Its business responsibility in one sentence (capability, not class name)2. The operations it exposes3. Its dependencies (data access, external clients, other services)4. Where it owns a consistency boundary (transaction, saga, etc.)5. Any notable patterns or anti-patternsApply compression: name the capability, not the class.

For any handler/controller/route

Analyze this handler/route file. Describe:1. The URL(s) and HTTP methods it handles2. What input it reads (path params, query, body, headers, session/token)3. What service/use-case it delegates to4. What it returns or renders5. Flag any business logic that should not be in the handler layerMap it as a request flow: entry → middleware → handler → service → response.

For any data access layer (ORM model, repository, DAO)

Analyze this data access module. Describe:1. The table(s) or collection(s) it maps to2. The queries or operations it exposes3. Any relationships and their loading strategy (eager/lazy)4. Any identified N+1 query risks5. Whether it contains business logic it shouldn't

For any external integration client/adapter

Analyze this integration client. Describe:1. What external service it wraps (name it by role, not product)2. The operations it exposes to the internal system3. Timeout, retry, and circuit breaker configuration4. How external errors are translated to internal domain errors5. Whether the external API contract is hidden behind an interface

For any build / package manifest

Analyze this build file. Extract:1. Language version and target runtime/platform2. All major framework and library dependencies with versions3. Any build plugins or tools that affect code generation or behavior4. Module/package structure if multi-module5. Any outdated, deprecated, or conflicting dependencies

For generating an ADR from a known decision

I need to write an Architecture Decision Record for the following decision:Decision context: [describe the situation that forced the decision]What was decided: [state the decision in one sentence]When it was made: [date or approximate period]Constraints at the time: [technical, resource, time, or org constraints]Alternatives that were considered: [list them]Generate a complete ADR using this structure:1. Title (ADR-NNN: [imperative verb + subject])2. Status: Accepted3. Context (2–4 sentences - what was true at the time that forced this decision)4. Decision (1–2 sentences, direct - no hedging)5. Alternatives considered (table: option | why rejected)6. Consequences (positive / negative / risks)7. Review date (suggest a specific date based on the decision type)Write it as if you are the engineer who made the decision, in past tense.Be specific - avoid generic phrases like "to improve performance."

For translating a debt list to CTO / executive audience

Below is a technical architectural debt list from an engineering analysis.Rewrite it for a CTO / VP Engineering audience using these rules:1. Replace every technical term with its business impact   (e.g. "N+1 query" → "database asks the same question N times under load")2. Frame each item as: [what it is] → [what happens if we don't fix it] → [what fixing it enables]3. Group items into three tiers:   - Immediate risk (production incidents or security exposure likely)   - Delivery friction (slowing feature development)   - Strategic constraints (limiting future scaling or modernization)4. For each item, provide: effort estimate in weeks, not story points5. End with a recommended priority order and the business case for the top 3Technical debt list to translate:[paste your anti-pattern checklist results here]

For scoring anti-patterns on the risk × effort matrix

Below is a list of anti-patterns found in a codebase analysis.Score each one on two dimensions and assign a decision:Risk score (1–5):  5 = Data loss, security breach, or production outage likely  4 = Significant user impact or revenue loss possible  3 = Degraded performance or increased incident frequency  2 = Developer friction - slows delivery, doesn't break anything  1 = Cosmetic / style - no functional impactEffort score (1–5):  5 = Architectural overhaul - months  4 = Multi-sprint refactor - weeks, multiple teams  3 = Single-sprint fix - 1–2 weeks, one team  2 = Days of work - targeted, low risk  1 = Hours - config change or single fileDecision rules:  High risk + Low effort   → Fix Now  High risk + Med effort   → Fix Next Sprint  High risk + High effort  → Plan & Track  Med risk  + Low effort   → Fix When Passing  Med risk  + Med effort   → Backlog With Date  Low/Med   + High effort  → Accept & Document  Low risk  + Any effort   → SkipFor each item produce:  Item | Risk score (with one-sentence justification) | Effort score (with one-sentence justification) | DecisionAnti-patterns to score:[paste your checklist findings here]

For validating a sequence diagram against logs

I have a documented sequence diagram for [flow name]:[paste your sequence diagram here]And the following log excerpt from a production trace of the same flow:[paste log excerpt here]Compare them. For each hop in the sequence diagram:1. Does it appear in the logs?2. Are there hops in the logs that are NOT in the diagram?3. Does the ordering match?4. Do the error paths in the diagram match what the logs show?Produce:- A verified steps list (diagram matches log)- A discrepancy list (diagram says X, log shows Y)- Missing steps list (in logs but not in diagram)- Recommended diagram updates

Synthesis Prompt - Deriving Architecture

Given these module summaries, apply the following compression rules:- Collapse modules/classes into capabilities (not names)- Collapse endpoints into use cases- Collapse tables/collections into domain concepts- Collapse integrations into rolesThen produce:1. Architecture style (one of: layered monolith / modular monolith /   microservices / event-driven / serverless / transitional)2. Component diagram (ASCII or Mermaid) with 5–12 components max3. Primary request flows as numbered step sequences4. Key design patterns identified5. Top 5 architectural debt items ranked by risk × effortFormat as an architecture overview document with explicit diagrams.

Full Pipeline

StageInputToolOutput
1. Static analysisSource / compiled outputStack-specific (see appendix)Dependency graph
2. Build file parsingPackage manifestLLMFramework + dependency inventory
3. Module summarizationSource files (chunked by module)LLMModule summaries
4. CompressionModule summariesLLM + compression rulesCapability map (5–12 components)
5. Pattern detectionSummaries + capability mapLLM + anti-pattern checklistPattern catalog, debt score
6. Diagram generationCapability map + dependency graphLLM → MermaidComponent, sequence, ER diagrams
7. ValidationDiagrams + logs + developerGates 1–4Verified or flagged claims
8. RefinementValidation findingsHumanUpdated diagrams + debt list
9. Doc assemblyAll aboveLLMFinal architecture doc

Output Document Template

Sections marked (stack-specific) should be filled in using your appendix.

01 - Executive Overview

System purpose, business context, user base. Architecture style in one paragraph. Key business capabilities supported.

02 - System Architecture

Architecture style. Component diagram (5–12 components). Runtime topology (app server/platform, DB, cache, message broker). Language, framework, runtime version. (stack-specific)

03 - Entry / Handler Layer (stack-specific)

Route/URL mapping table (path → handler). Middleware/filter chain with purpose of each. Session/token management strategy. Framework-specific patterns in use.

04 - Component Design

Capability table (not class names). Per-component: responsibility, interfaces, dependencies. Inter-component dependency diagram. Design patterns identified per component.

05 - Data Flow - Key Use Cases

Sequence diagram per flow (revenue-generating first). Consistency boundary shown per flow. Error handling paths. Async vs sync distinction.

06 - Data Model

ER or schema diagram (key entities only - 80/20 rule). Domain concept groupings. ORM/ODM relationship summary. Any raw query areas outside the ORM.

07 - Security Architecture

Authentication mechanism. Authorization model. Session/token security. Identified gaps. (stack-specific)

08 - External Integrations

All outbound calls grouped by role. Message brokers. Legacy interfaces. Timeout/retry per integration. Error/fallback behavior.

09 - Deployment and Operations (stack-specific)

Deployment unit and target platform. Config management approach. Logging and observability setup. Process management and scaling approach.

10 - Key Design Decisions and Tradeoffs

Why this architecture. Major framework decisions. Known tradeoffs. Regretted decisions.

11 - Architectural Debt and Modernization Map

Anti-pattern checklist score. Layer violation inventory. Refactoring priorities (ranked risk × effort). Migration paths under consideration.

12 - Constraints and Non-Functional Requirements

Performance (observed). Scalability limits. Maintainability. Test coverage gaps. Compliance requirements.


Minimum Viable Architecture Doc (When to Stop)

The guide tells you what to document. This section tells you when enough is enough.

The trap most engineers fall into: treating architecture documentation as a completeness exercise. It isn't. Stop when the document can answer the stakeholder's actual question. Every additional section beyond that point has diminishing returns.

The Minimum Viable Doc by Stakeholder Question

Stakeholder questionMinimum artifacts requiredYou can skip
"What does this system do?"Executive overview (01) + component diagram (02)Sections 03–12
"Why is this slow / breaking?"Key flows (05) + anti-pattern score (11)Sections 01, 06–10, 12
"Can we onboard a new engineer?"Sections 01, 02, 03, 05 + stack appendixSections 07–12
"Is it safe to go to production?"Security (07) + deployment (09) + NFR (12)Most of 03–06
"Should we modernize / rewrite?"Debt map (11) + design decisions (10) + NFR (12)Sections 03–06
"Full architecture review"All 12 sectionsNothing

The "Done Enough" Test

Before adding another section, ask:

1. Who specifically will read this section?2. What decision will it help them make?3. If I skip it, what is the worst realistic outcome?

If you cannot name a real person and a real decision for question 1 and 2, stop. The section is for completeness, not utility.

Minimum Viable Doc: The 3-Artifact Floor

Regardless of stakeholder, every architecture output must have at least these three artifacts before it can be called architecture (not just notes):

✅ 1. Component diagram (5–12 components, capability-named)   → Answers: what are the moving parts?✅ 2. One end-to-end sequence diagram (the primary revenue flow)   → Answers: what actually happens at runtime?✅ 3. Debt score from the anti-pattern checklist   → Answers: how healthy is it, and what's the biggest risk?

Everything else is depth on top of this floor. If you have time for only one thing: produce these three artifacts and stop.

When to Expand Beyond the Floor

Add sections when a specific gap creates a specific risk:

GapRisk if undocumentedAdd section
Security mechanism unknownAudit failure, breach07 - Security
Deployment process unclearProduction incident09 - Deployment
External dependencies untrackedIntegration failure, vendor lock-in08 - Integrations
Data model not understoodData loss, migration failure06 - Data Model
Performance constraints unknownSLA breach12 - NFR

Audience Adaptation Guide

The same architecture produces a different document depending on who reads it. Don't produce one document and hope everyone gets what they need from it. Adapt deliberately.

Audience Profiles

CTO / VP Engineering (Strategic)

What they need: Business impact, risk exposure, modernization cost, decision support.
Reading time available: 5–10 minutes.
Format: Executive brief, not technical deep-dive.

Include:✅ Section 01 (Executive Overview) - expanded, business-language✅ Section 02 (Architecture) - component diagram only, no code references✅ Section 10 (Design Decisions) - framed as "why we built it this way"✅ Section 11 (Debt Map) - framed as risk and cost, not technical violations✅ Modernization options table (3 options with effort/risk/benefit)Remove or move to appendix:❌ All code snippets and bash commands❌ Section 03 (Handler Layer detail)❌ Section 06 (Data Model detail)❌ Anti-pattern checklist raw output

Framing rule: Replace all technical terms with business impact statements.

❌ Technical: "14 JSP files contain scriptlets - layer violation"✅ Business:  "14 UI components contain business logic - each change               requires a developer instead of a designer, slowing               feature delivery by ~2 days per change"

Engineering Team / New Engineers (Operational)

What they need: How to work in the system. What exists, where things live, what the rules are.
Reading time available: 30–60 minutes.
Format: Reference document they can return to.

Include:✅ All 12 sections✅ Stack appendix (their specific stack)✅ Anti-pattern checklist (so they know what to avoid adding)✅ Sequence diagrams with code-level detail✅ Layer interaction rules (what is and isn't allowed)Emphasise:→ Section 03 (Handler Layer) - where to add new endpoints→ Section 04 (Component Design) - which service to call for what→ Section 05 (Data Flows) - how to trace a bug through the system→ Section 11 (Debt Map) - what not to copy from existing code

Security / Compliance Auditor

What they need: Evidence of controls, identified gaps, data flows for sensitive data.
Reading time available: 20–30 minutes on their priority sections.
Format: Structured, citable, gap-explicit.

Include:✅ Section 07 (Security Architecture) - primary section, fully detailed✅ Section 05 (Data Flows) - annotated with: what data, where it goes, who can see it✅ Section 08 (Integrations) - every outbound call with auth mechanism✅ Section 09 (Deployment) - config management, secrets handling✅ Section 12 (NFR) - compliance requirements explicitly called outFormat requirements:→ Every security control must state: what it protects, how it's enforced, where it breaks→ Gaps must be listed explicitly - do not omit or soften them→ Use "identified gap" not "area for improvement"

External Consultant / Modernization Assessor

What they need: Current state, debt score, change risk, migration options.
Reading time available: 1–2 hours, deep read.
Format: Complete picture with no gaps hidden.

Include:✅ All 12 sections✅ Actual vs intended architecture gap analysis (Section 11)✅ Full anti-pattern checklist results - raw, unfiltered✅ Architecture style identification (Section Pass 3, Step 2)✅ Known failure modes and production incidents referencedCritical: do not sanitize the debt map for this audience.A consultant working from an optimistic debt list will recommendthe wrong modernization path.

Adaptation Checklist

Before sharing any architecture doc, confirm:

[ ] I know who will read this[ ] I have removed or summarised sections they won't use[ ] Technical terms affecting non-technical readers have been translated[ ] The debt/risk section is framed in the language this audience uses[ ] The document is the right length for their available reading time

Debt Risk × Effort Matrix

Finding debt is not enough. The hardest question is: fix it now, or document it and live with it?

This matrix answers that question with a structured decision framework.

Step 1 - Score Each Debt Item

For each item from your anti-pattern checklist, score it on two axes:

Risk score (1–5): What is the blast radius if this goes wrong?

ScoreMeaning
5Data loss, security breach, or production outage likely
4Significant user impact or revenue loss possible
3Degraded performance or increased incident frequency
2Developer friction - slows delivery, doesn't break anything
1Cosmetic / style - no functional impact

Effort score (1–5): How much work to fix it?

ScoreMeaning
5Architectural overhaul - months, high coordination cost
4Multi-sprint refactor - weeks, affects multiple teams
3Single-sprint fix - 1–2 weeks, one team
2Days of work - targeted, low risk
1Hours - a config change or a single file

Step 2 - Place on the Matrix

        EFFORT →        Low (1-2)     Medium (3)     High (4-5)       ┌─────────────┬─────────────┬─────────────┐High   │  FIX NOW    │  FIX NEXT   │  PLAN &     │(4-5)  │  (quick win │  SPRINT     │  TRACK      │  R    │  high value)│             │             │  I    ├─────────────┼─────────────┼─────────────┤  S    │  FIX WHEN   │  BACKLOG    │  ACCEPT &   │  K    │  PASSING    │  WITH DATE  │  DOCUMENT   │  ↑    │             │             │             │Low    ├─────────────┼─────────────┼─────────────┤(1-2)  │  SKIP /     │  SKIP       │  SKIP       │       │  AUTO-LINT  │             │             │       └─────────────┴─────────────┴─────────────┘

Step 3 - Apply the Decision Rules

FIX NOW (High risk, Low effort)
Do this before the next release. These are the "free" wins - disproportionate risk reduction for minimal cost.
Example: Raw SQL string with user input (SQL injection risk, 2-hour fix)

FIX NEXT SPRINT (High risk, Medium effort)
Schedule explicitly. Do not let these sit in a backlog with no date - they will never happen.
Example: No circuit breaker on payment gateway calls (production incident risk, 1-week fix)

PLAN & TRACK (High risk, High effort)
These require a modernization project. Quantify the risk annually and use it to justify the investment.
Example: Monolith preventing independent scaling of payment service

FIX WHEN PASSING (Medium/Low risk, Low effort)
Fix opportunistically - when a developer is already in that file for another reason.
Example: Handler method at 60 lines, should be 40

BACKLOG WITH DATE (Medium risk, Medium effort)
Add to backlog with a real review date. If the date passes without action, escalate the risk score.
Example: 14 view files containing business logic - functional, but slowing delivery

ACCEPT & DOCUMENT (Low/Medium risk, High effort)
Explicitly accept this debt. Document it as a known constraint, not a gap. Include it in onboarding so new engineers understand it's intentional.
Example: Legacy SOAP integration that would take months to replace - system works fine with it

SKIP (Low risk, any effort)
Do not spend architectural attention here. Let linters handle it.

Step 4 - Produce the Debt Register

For each item in "Fix Now" or "Fix Next Sprint," create a debt register entry:

Debt item:     [Name of the anti-pattern]Location:      [File / module / layer where it occurs]Risk score:    [1–5] - [one sentence explaining the risk]Effort score:  [1–5] - [one sentence explaining the fix]Decision:      [Fix Now / Fix Next Sprint / Plan / Accept / Skip]Owner:         [Team or engineer responsible]Target date:   [Specific date, not "Q3" - vague dates mean never]Evidence:      [Link to the specific code location]

Worked Example - Debt Scoring in Practice

This example shows 6 real debt items scored, placed on the matrix, and assigned decisions. Use it to calibrate your own scoring.

#Debt itemLocationRiskEffortDecision
1Raw SQL concatenated with user inputReportController.java:2145 - SQL injection, data breach risk1 - Single parameterised query fixFIX NOW
2No timeout on payment gateway HTTP clientPaymentClient.java5 - Gateway hang = app hang, revenue loss2 - One config propertyFIX NOW
3OrderController calls InventoryDAO directlyOrderController.java:893 - Bypasses transaction scope, occasional inconsistency3 - Move call through OrderServiceFIX NEXT SPRINT
414 JSP files contain scriptlet business logicWEB-INF/views/*.jsp2 - Slows delivery, doesn't break anything4 - Migrate to Thymeleaf, multi-sprintBACKLOG WITH DATE
5OrderService owns Order + Payment + InvoicingOrderService.java3 - Single point of failure, deployment coupling5 - Domain split, architectural projectPLAN & TRACK
6Handler method names inconsistently casedAdminController.java1 - Cosmetic, no functional impact1 - Auto-fixable with linter ruleSKIP / AUTO-LINT

Scoring notes for borderline cases:

Risk 3 vs 4 - ask: "Has this caused a production incident in the past year?"  Yes → score 4. No → score 3.Effort 3 vs 4 - ask: "Does fixing this require coordinating more than one team?"  Yes → score 4. No → score 3.When in doubt, score higher on risk, lower on effort.  Higher risk → more urgency to fix.  Lower effort → easier to justify doing it now.

Debt register entries for the "Fix Now" items:

Debt item:     Raw SQL concatenated with user inputLocation:      src/main/java/com/example/ReportController.java:214Risk score:    5 - SQL injection vulnerability, direct data breach pathEffort score:  1 - Replace string concat with PreparedStatement (2 hours)Decision:      Fix NowOwner:         Platform teamTarget date:   2025-02-14Evidence:      github.com/org/repo/blob/main/.../ReportController.java#L214---Debt item:     No timeout on payment gateway HTTP clientLocation:      src/main/java/com/example/PaymentClient.javaRisk score:    5 - Gateway hang blocks all payment threads, revenue stopsEffort score:  2 - Set connectTimeout + readTimeout in RestTemplate configDecision:      Fix NowOwner:         Payments teamTarget date:   2025-02-14Evidence:      github.com/org/repo/blob/main/.../PaymentClient.java#L31

Team & Org Context

The guide assumes one engineer doing the analysis alone. Real enterprise scenarios are messier. This section covers the human and political dimensions.

When Engineers Disagree on the Architecture

Disagreement about what the architecture is (not what it should be) is more common than it looks and has a specific cause: different engineers have different accurate views of different parts of the system. Both views are correct for their slice. The architecture doc must reconcile them, not pick a winner.

Resolution process:

Step 1: Map the disagreement precisely   → "We disagree about whether X calls Y directly" is precise   → "We disagree about the architecture" is not   → If you can't write the disagreement in one sentence, it's not     clear enough to resolveStep 2: Go to the code, not to consensus   → Check the actual import graph, not memory   → Run the dependency analysis tool (see appendix)   → The code is the ground truth - opinions are notStep 3: Document both views if the code is transitional   → "Intended: X → Service → Y. Actual: 3 of 14 handlers call Y directly.      Migration is 60% complete."   → Don't flatten a transitional state into a clean diagramStep 4: If still unresolved after step 2, the dispute is about the   intended architecture, not the current one   → Separate the two explicitly in the doc   → "Current state" vs "target state" are different sections

What to Do with Sensitive Findings

Some findings are politically sensitive: a beloved senior engineer's module is the God service. The team lead's design decision from 2018 is now the biggest scalability constraint. A compliance gap exists that no one wants to own.

Principles for handling sensitive findings:

✅ Document the finding - omitting it makes the doc misleading✅ Frame as system state, not personal failure   → "The payment module has grown to own 4 distinct domains"   → NOT "John's payment module violates single responsibility"✅ Pair every finding with a recommended action   → A finding without a path forward reads as blame✅ Share findings with affected team leads before publishing broadly   → No one should read about a problem with their code for the first time     in a team-wide document❌ Never soften findings to the point of hiding them❌ Never name individual engineers in debt or violation lists

Presenting Architecture to Non-Technical Stakeholders

When presenting architecture findings to business stakeholders, the architecture doc is not the deliverable - a decision brief is.

Structure the brief as:

1. What the system does (1 paragraph, business language)2. What the current constraints are (3 bullets, business impact framing)3. What options we have (2–3 options, each with: cost / risk / benefit)4. What we recommend (1 option, with rationale)5. What we need from you (a decision, a resource, an approval)

The architecture doc is the evidence behind the brief. It should be available as an appendix, not the main document.

Translation rules for non-technical audiences:

Technical termBusiness translation
Layer violation"Code in the wrong place - means changes take longer and break more often"
God service"One component doing too many things - a bottleneck and a single point of failure"
N+1 query"The system asks the database the same question N times when once would do - causes slowdowns under load"
No circuit breaker"If payment provider goes down, we go down too - no automatic fallback"
Session affinity"Users are tied to a specific server - we can't add capacity during peak load without disrupting sessions"
Tech debt score"Every point is a risk we're carrying - at 16+, we're spending more managing debt than building features"

Architecture Evolution & Decision Records

Architecture is not a snapshot - it is a living system. A document that captures the architecture today without capturing how it got here and where it's going will mislead the team within 6 months.

Architecture Decision Records (ADRs)

An ADR is a short document that captures a single architectural decision: what was decided, why, what alternatives were considered, and what the consequences are.

Why ADRs matter:
Every legacy codebase has patterns that look wrong to new engineers. Without ADRs, those engineers refactor them - only to discover six months later why the original decision was made. ADRs prevent that cycle.

ADR format:

# ADR-[number]: [Decision title]**Date:** YYYY-MM-DD**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-[N]**Deciders:** [Names or teams involved]## ContextWhat situation forced this decision? What constraints existed?(2–4 sentences. Be specific about what was true at the time.)## DecisionWhat was decided?(1–2 sentences. State it directly - no hedging.)## Alternatives considered| Option | Why rejected ||--------|-------------|| [Option A] | [Reason] || [Option B] | [Reason] |## Consequences**Positive:** What does this enable?**Negative:** What does this constrain or cost?**Risks:** What could go wrong because of this decision?## Review dateWhen should this decision be revisited? (Set a real date.)

ADR examples for common enterprise decisions:

ADR-001: Use layered monolith instead of microservicesADR-002: Use Spring MVC over REST-only APIADR-003: Store sessions in DB instead of in-memoryADR-004: Accept SOAP integration with ERP system (not replace)ADR-005: Use JPA/Hibernate for all data access

Where to Store ADRs

docs/  architecture/    adr/      ADR-001-monolith-over-microservices.md      ADR-002-spring-mvc-choice.md      ADR-003-session-storage.md    diagrams/      component-diagram-v3.mermaid      order-flow-sequence.mermaid    architecture-guide.md   ← this document

Keep ADRs in the same repository as the code they govern. If the code moves, the ADRs move with it.

Versioning the Architecture Document

When the architecture changes significantly, don't overwrite the previous version - capture the evolution.

Version triggers - create a new architecture version when:

[ ] A new service or major component is added or removed[ ] The deployment model changes (e.g., monolith → modular, on-prem → cloud)[ ] A major external dependency changes (new payment provider, new DB)[ ] The authentication model changes[ ] A significant debt item is resolved (document the before/after)[ ] The team structure changes in a way that reflects in component ownership

Version header to add to every architecture doc:

## Document Version History| Version | Date | Author | Summary of changes ||---------|------|--------|-------------------|| v1.0 | 2019-03 | [name] | Initial architecture capture || v1.1 | 2020-08 | [name] | Added payment service, updated sequence diagrams || v2.0 | 2022-11 | [name] | Migrated from Struts to Spring MVC - full re-capture || v3.0 | 2024-06 | [name] | Extracted notification module, updated debt map |

Architecture Drift Detection

Architecture drift - the gap between documented and actual architecture - accumulates silently. Build a lightweight mechanism to detect it.

Automated drift signals:

# Run monthly - compare import graph against last documented state# If new cross-layer imports appear, flag for reviewjdeps -dotoutput ./deps-current target/myapp.jardiff deps-baseline.dot deps-current/myapp.dot# Count anti-pattern checklist items - track trend over time# Increasing score = drift is acceleratinggrep -c "^\[x\]" architecture-checklist-current.txt

Human drift signals - any of these should trigger an architecture review:

[ ] A developer says "I didn't know that module existed"[ ] A bug required changes in 4+ unrelated modules[ ] A new hire's mental model of the system is significantly wrong after onboarding[ ] A production incident exposed a dependency not in the architecture doc[ ] The team is debating "how it works" rather than "how to improve it"[ ] An integration test exercises a path not in any sequence diagram

Drift review cadence:

TriggerAction
MonthlyRun automated import graph diff - flag new cross-layer violations
QuarterlyGate 4 validation (failure scenarios) + human drift signal check
After major featureUpdate sequence diagrams for affected flows
After incidentTrace the incident path through the doc - update where it diverged
After team changeRe-validate component ownership - teams and components should align

Pro Tips (Language-Agnostic)

  • Apply compression rules before writing any component name - ask "is this a capability or a class?"
  • Stop when the document can answer the stakeholder's question - completeness is not the goal, utility is
  • The middleware/filter chain is your cross-cutting concerns map - document it before anything else in the handler layer
  • Document all state stores explicitly - session, JWT, DB, cache, in-memory - this is the hidden scalability constraint
  • Config files from the system's early years are the most valuable artifacts - they capture the intended architecture before drift
  • Write one ADR for every decision you wish had been documented when you arrived - future engineers will thank you
  • Treat the debt map as a deliverable, not a footnote - frame it in risk×effort language that stakeholders can act on
  • Start with the happy path per flow, then expand to error paths - error handling reveals the real complexity
  • Never name engineers in debt or violation lists - findings are about the system state, not about people
  • If it cannot be drawn, it is not architecture - every claim must have a corresponding box or arrow
  • An architecture doc not validated in 6 months is a hypothesis - re-validate before any major decision is based on it


Stack Appendices

Each appendix provides the stack-specific details for every step in the core guide.


Appendix A - Java / Spring / JSP

Entry Points

TypeFiles / annotations
HTTPweb.xml, @Controller, @RestController, @WebServlet, DispatcherServlet
IoC rootapplicationContext.xml, @Configuration, @ComponentScan, ejb-jar.xml
Background jobs@Scheduled, QuartzJobBean, @MessageDriven, @JmsListener
Build rootpom.xml, build.gradle, settings.gradle

Module Layer Mapping

LayerTypical packagesKey types
Handler*.web, *.controller, *.actionServlet, Controller, ActionForm
Orchestration*.service, *.business, *.facade, *.ejb@Service, @Stateless, Session Bean
Data access*.dao, *.repository@Repository, JpaRepository, JdbcTemplate
Domain*.model, *.domain, *.entity@Entity, POJO
Integration*.integration, *.client, *.adapterRestTemplate, WebServiceTemplate
Cross-cutting*.aspect, *.security, *.util@Aspect, Filter, HandlerInterceptor

Dependency Analysis Tools

# Module-level dependency graphjdeps --print-module-deps -recursive target/myapp.jarjdeps -dotoutput ./deps target/myapp.jar# Full transitive dependency treemvn dependency:tree -Dverbosemvn help:effective-pom# Circular dependency detection# Add to pom.xml:# <rule implementation="org.apache.maven.enforcer.rules.dependency.BanCircularDependencies"/># Architecture rule enforcement (as JUnit tests)# ArchUnit: com.tngtech.archunit

JSP / Web Tier Specific Scanning

# Find scriptlets (business logic in views - architectural debt)grep -rn "<%[^@!]" src/main/webapp --include="*.jsp"# Find session state (scalability constraint inventory)grep -rn "session\.setAttribute" src/ --include="*.java" --include="*.jsp"# Find JNDI lookups (service locator pattern - legacy signal)grep -rn "InitialContext\|lookup(" src/main/java --include="*.java"# Find raw SQL strings (injection risk + query inventory)grep -rn "\"SELECT\|\"INSERT\|\"UPDATE\|\"DELETE" src/ --include="*.java"

Anti-Pattern Additions (Java-Specific)

[ ] JSP contains <% %> scriptlets with business logic[ ] @Transactional placed on DAO methods (should be on service)[ ] JDBC calls with manual commit/rollback outside service layer[ ] HttpSession stores non-serializable domain entities[ ] N+1 queries from LAZY fetch in a loop (Hibernate)[ ] Spring XML config overridden by annotation config inconsistently

Runtime Observation Tools

ToolPurpose
jdepsModule dependency graph from bytecode
ArchUnitArchitecture rules as JUnit tests
jQAssistantGraph-based codebase analysis (Neo4j)
JVisualVM / JConsoleThread dumps, heap, JMX metrics
Async ProfilerCPU/allocation profiler, low overhead
p6spy / datasource-proxyJDBC call logging with parameters
Hibernate StatisticsN+1 detection, cache hit rate
Prometheus + MicrometerMetrics if Spring Actuator present

AI Prompt Additions

For a JSP file:

Analyze this JSP. Describe:1. What model attributes it expects2. Any business logic in scriptlets - list each occurrence3. Which forms it submits and to which action URLs4. Which other JSPs it includes5. Classify: pure view / mixed view-logic / heavy logic (migration needed)

For pom.xml:

Analyze this Maven POM. Extract:1. Java version and target runtime (Tomcat / JBoss / WebLogic)2. All major framework dependencies with versions3. Build plugins affecting behavior (aspectj, code generation, etc.)4. Multi-module structure5. Dependency conflicts or end-of-life libraries

Appendix B - Python / Django / FastAPI

Entry Points

TypeFiles / decorators
HTTP (Django)urls.py, views.py, @api_view, ViewSet
HTTP (FastAPI)main.py, @app.get, @app.post, APIRouter
HTTP (Flask)app.py, @app.route, Blueprint
IoC / configsettings.py, config.py, INSTALLED_APPS
Background jobscelery.py, @shared_task, @app.task, APScheduler
Build rootrequirements.txt, pyproject.toml, Pipfile

Module Layer Mapping

LayerTypical locationsKey types
Handlerviews.py, routers/, api/View, ViewSet, APIRouter, endpoint function
Orchestrationservices/, use_cases/, business/Plain Python classes/functions
Data accessrepositories/, models.py (queries only)QuerySet, ORM Manager, raw SQL
Domainmodels.py (structure), domain/, entities/Django Model, dataclass, Pydantic model
Integrationclients/, adapters/, integrations/requests, httpx, boto3 clients
Cross-cuttingmiddleware/, decorators/, utils/Django Middleware, FastAPI Dependency

Dependency Analysis Tools

# Import graph (pip install pydeps)pydeps src/myapp --max-bacon=3 --cluster# Circular import detectionpip install isortisort --check-only --diff .# Dead code / unused importspip install vulturevulture src/# Dependency treepip install pipdeptreepipdeptree# Type-checked architecture rulespip install import-linter# Define contracts in .importlinter config

Django / FastAPI Specific Scanning

# Find business logic in views (fat view detection)grep -rn "def get\|def post\|def put\|def delete" */views.py | wc -l# Find raw SQL (injection risk)grep -rn "raw(\|cursor.execute\|RawSQL" . --include="*.py"# Find direct model access in views (bypasses service layer)grep -rn "\.objects\." */views.py --include="*.py"# Find N+1 risks (queryset in loop)grep -rn "for.*in.*\:" . --include="*.py" -A2 | grep "\.objects\."# Django: check for missing select_related / prefetch_relatedgrep -rn "\.objects\.filter\|\.objects\.all" . --include="*.py" | \  grep -v "select_related\|prefetch_related"

Anti-Pattern Additions (Python-Specific)

[ ] Business logic in views.py / route handlers (fat views)[ ] Direct .objects. queryset calls inside view functions[ ] Missing select_related / prefetch_related (Django N+1)[ ] settings.py contains secrets (not env vars)[ ] Celery tasks contain business logic (should delegate to service)[ ] Django signals used for core business logic (obscures flow)[ ] Synchronous external HTTP calls inside async endpoint handlers

Runtime Observation Tools

ToolPurpose
pydepsModule dependency graph
import-linterArchitecture rule enforcement
Django Debug ToolbarQuery count, timing per request
django-silkRequest/response profiling
FlowerCelery task monitoring
py-spySampling profiler, low overhead
Prometheus + django-prometheusMetrics

Appendix C - Node.js / Express / NestJS

Entry Points

TypeFiles / decorators
HTTP (Express)app.js, server.js, routes/, router.use()
HTTP (NestJS)main.ts, @Controller, @Module, AppModule
HTTP (Fastify)app.js, fastify.register(), route plugins
Config / DIapp.module.ts (NestJS), container.js (custom DI)
Background jobsbull queues, node-cron, @nestjs/schedule, @Processor
Build rootpackage.json, yarn.lock, tsconfig.json

Module Layer Mapping

LayerTypical locationsKey types
Handlercontrollers/, routes/, *.controller.tsExpress Router, NestJS Controller
Orchestrationservices/, *.service.ts, use-cases/Plain class, NestJS Service
Data accessrepositories/, *.repository.ts, models/TypeORM Repository, Mongoose Model
Domainentities/, domain/, *.entity.tsTypeORM Entity, Mongoose Schema
Integrationclients/, adapters/, *.client.tsaxios/got wrappers, SDK clients
Cross-cuttingmiddleware/, guards/, interceptors/, pipes/NestJS Guard, Interceptor, Middleware

Dependency Analysis Tools

# Module dependency graphnpx depcruise --include-only "^src" --output-type dot src | dot -T svg > deps.svg# Circular dependency detectionnpx madge --circular src/# Unused exports / dead codenpx ts-prune# Outdated packagesnpm outdatednpx npm-check-updates# Bundle analysis (if applicable)npx webpack-bundle-analyzer

Express / NestJS Specific Scanning

# Find route definitions (entry point inventory)grep -rn "router\.\(get\|post\|put\|delete\|patch\)" src/ --include="*.js" --include="*.ts"# Find direct DB calls in controllers (layer violation)grep -rn "\.find\|\.save\|\.query\|\.execute" src/controllers/ --include="*.ts"# Find missing async error handlinggrep -rn "async.*req.*res" src/routes/ --include="*.js" | grep -v "try\|catch"# Find untyped any (TypeScript debt)grep -rn ": any" src/ --include="*.ts" | wc -l

Anti-Pattern Additions (Node.js-Specific)

[ ] Business logic in Express route handlers directly[ ] Missing async error handling (unhandled promise rejections)[ ] Synchronous file I/O (fs.readFileSync) inside request handlers[ ] Missing input validation before DB operations[ ] No connection pooling configured for DB client[ ] Secrets hardcoded in source (not process.env)[ ] Callback hell - nested callbacks instead of async/await[ ] No rate limiting on public endpoints

Runtime Observation Tools

ToolPurpose
depcruiseModule dependency graph and rule enforcement
madgeCircular dependency detection
clinic.jsCPU, memory, async profiling
0xFlame graph profiler
Bull BoardQueue monitoring
Pino / WinstonStructured logging
Prometheus + prom-clientMetrics

Appendix D - .NET / C# / ASP.NET

Entry Points

TypeFiles / attributes
HTTPProgram.cs, Startup.cs, [ApiController], [Route], MapControllers()
IoC / DI rootProgram.cs (.AddScoped, .AddSingleton), appsettings.json
Background jobsIHostedService, BackgroundService, Hangfire, Quartz.NET
Build root*.csproj, *.sln, NuGet.config, Directory.Build.props

Module Layer Mapping

LayerTypical locationsKey types
HandlerControllers/, Endpoints/, *.Controller.csControllerBase, MinimalAPI handler
OrchestrationServices/, Application/, UseCases/Plain C# class, MediatR Handler
Data accessRepositories/, Data/, *.Repository.csEF Core DbContext, Dapper queries
DomainDomain/, Entities/, Models/POCO, record types, value objects
IntegrationInfrastructure/, Clients/, Adapters/HttpClient wrappers, SDK clients
Cross-cuttingFilters/, Middleware/, Behaviors/ActionFilter, Middleware, MediatR Pipeline

Dependency Analysis Tools

# NDepend (commercial) - comprehensive .NET dependency analysis# dotnet-depends (free)dotnet tool install -g dotnet-dependsdotnet depends# Circular reference detection# ReSharper / Rider: built-in architecture diagram# Outdated packagesdotnet list package --outdated# Unused referencesdotnet tool install -g dotnet-script# Or use ReSharper's "Remove Unused References"

Anti-Pattern Additions (.NET-Specific)

[ ] Business logic in Controller action methods[ ] DbContext injected directly into Controllers (bypasses repository)[ ] Missing cancellation token propagation in async methods[ ] Synchronous .Result or .Wait() calls on async methods (deadlock risk)[ ] Missing using statements / DbContext not disposed (connection leak)[ ] Secrets in appsettings.json committed to source control[ ] Missing EF Core AsNoTracking() on read-only queries[ ] N+1 queries from missing .Include() in EF Core

Runtime Observation Tools

ToolPurpose
NDependArchitecture rule enforcement, dependency graph
dotMemory / dotTraceMemory and CPU profiling
MiniProfilerPer-request DB query profiling
Application InsightsAPM, distributed tracing
SeqStructured log analysis
Hangfire DashboardBackground job monitoring

Appendix E - Ruby on Rails

Entry Points

TypeFiles / conventions
HTTPconfig/routes.rb, app/controllers/, *_controller.rb
Config / initconfig/application.rb, config/initializers/, Gemfile
Background jobsapp/jobs/, *_job.rb, Sidekiq workers, Resque
Build rootGemfile, Gemfile.lock, .ruby-version

Module Layer Mapping

LayerTypical locationsKey types
Handlerapp/controllers/ApplicationController subclasses
Orchestrationapp/services/, app/interactors/, app/use_cases/Plain Ruby objects (POROs)
Data accessapp/models/ (ActiveRecord queries)ActiveRecord model query methods
Domainapp/models/ (structure), app/domain/ActiveRecord model, value objects
Integrationapp/clients/, app/adapters/Faraday, HTTParty wrappers
Cross-cuttingapp/concerns/, lib/, app/middleware/Concern, Rack Middleware

Dependency Analysis Tools

# Gem dependency treebundle viz   # generates dependency graph image# Circular dependency detectiongem install rubocop-railsrubocop --only Rails/FilePath# Dead code detectiongem install debridedebride app/# Unused gemsgem install bundler-auditbundle audit# Rails-specific: check for N+1gem install bullet  # add to Gemfile (development group)

Anti-Pattern Additions (Rails-Specific)

[ ] Fat controller - business logic beyond params + render in controller[ ] Fat model - ActiveRecord model with 500+ lines of non-persistence logic[ ] Logic in views / ERB templates[ ] Direct ActiveRecord queries in controllers (bypasses service layer)[ ] N+1 queries - missing .includes() / .eager_load()[ ] Callbacks (before_save, after_create) containing business logic[ ] God model - one ActiveRecord class owning unrelated domains[ ] Missing database indexes on foreign keys and frequently queried columns

Runtime Observation Tools

ToolPurpose
bundle vizGem dependency graph
BulletN+1 query detection
rack-mini-profilerRequest profiling
Skylight / Scout APMProduction APM
Sidekiq WebBackground job monitoring
PgHeroPostgreSQL query analysis

Appendix F - Go

Entry Points

TypeFiles / patterns
HTTPmain.go, cmd/, internal/handler/, http.HandleFunc, chi.Router, gin.Engine
Configconfig/, internal/config/, env-based config structs
Background jobsgoroutine launchers in main.go, internal/worker/, cron packages
Build rootgo.mod, go.sum, Makefile

Module Layer Mapping

LayerTypical locationsKey types
Handlerinternal/handler/, internal/api/http.HandlerFunc, gin/chi handler
Orchestrationinternal/service/, internal/usecase/Plain Go struct with methods
Data accessinternal/repository/, internal/store/Interface + concrete DB implementation
Domaininternal/domain/, internal/model/Go structs, value types
Integrationinternal/client/, internal/adapter/HTTP clients, SDK wrappers
Cross-cuttinginternal/middleware/, pkg/Middleware functions, shared utilities

Dependency Analysis Tools

# Module dependency graphgo mod graph# Import cycle detection (built into go build)go build ./...    # fails on circular imports# Static analysisgo vet ./...# Dead codego install golang.org/x/tools/cmd/deadcode@latestdeadcode ./...# Dependency visualizationgo install github.com/kisielk/godepgraph@latestgodepgraph ./... | dot -Tpng -o deps.png# Outdated dependenciesgo list -m -u all

Anti-Pattern Additions (Go-Specific)

[ ] Business logic in http.HandlerFunc directly[ ] Missing context propagation (ctx not passed through call chain)[ ] Goroutine leak - goroutine started with no cancellation mechanism[ ] Missing error wrapping (errors.Wrap / fmt.Errorf %w)[ ] Global state in package-level variables (not safe for concurrent use)[ ] Interface not defined at point of use (defined in implementation package)[ ] Missing graceful shutdown handling (http.Server.Shutdown)[ ] init() functions with side effects (hidden initialization order)

Runtime Observation Tools

ToolPurpose
go tool pprofCPU, memory, goroutine profiling
go tool traceExecution trace, goroutine scheduling
expvar / pprof HTTP endpointRuntime metrics
OpenTelemetry GoDistributed tracing
Prometheus + promhttpMetrics
golangci-lintComprehensive static analysis


Appendix G - AI Prompt Library (Complete Reference)

This appendix consolidates every prompt in the guide into a single reference. Copy, adapt, and chain these prompts across your LLM tool of choice. Each prompt is self-contained - paste the prompt, fill in the bracketed sections, and send.


Section 1 - Analysis Prompts (Pass 1 & 2)

G.1 - Service / use-case module

Analyze this module. Describe:1. Its business responsibility in one sentence (capability, not class name)2. The operations it exposes3. Its dependencies (data access, external clients, other services)4. Where it owns a consistency boundary (transaction, saga, etc.)5. Any notable patterns or anti-patternsApply compression: name the capability, not the class.[paste module source code here]

G.2 - Handler / controller / route

Analyze this handler/route file. Describe:1. The URL(s) and HTTP methods it handles2. What input it reads (path params, query, body, headers, session/token)3. What service/use-case it delegates to4. What it returns or renders5. Flag any business logic that should not be in the handler layerMap it as a request flow: entry → middleware → handler → service → response.[paste handler source code here]

G.3 - Data access layer (ORM model, repository, DAO)

Analyze this data access module. Describe:1. The table(s) or collection(s) it maps to2. The queries or operations it exposes3. Any relationships and their loading strategy (eager/lazy)4. Any identified N+1 query risks5. Whether it contains business logic it shouldn't[paste data access source code here]

G.4 - External integration client / adapter

Analyze this integration client. Describe:1. What external service it wraps (name it by role, not product)2. The operations it exposes to the internal system3. Timeout, retry, and circuit breaker configuration4. How external errors are translated to internal domain errors5. Whether the external API contract is hidden behind an interface[paste client source code here]

G.5 - Build / package manifest

Analyze this build file. Extract:1. Language version and target runtime/platform2. All major framework and library dependencies with versions3. Any build plugins or tools that affect code generation or behavior4. Module/package structure if multi-module5. Any outdated, deprecated, or conflicting dependencies[paste build file contents here]

G.6 - Configuration / IoC wiring file

Analyze this configuration file. Extract:1. What components/beans/services are registered2. What environment-specific values are present vs externalised3. Any security-sensitive config (credentials, secrets, connection strings)4. Cross-cutting concerns configured here (logging, auth, caching, tracing)5. What this config tells us about the intended architecture[paste config file here]

Section 2 - Synthesis Prompts (Pass 3)

G.7 - Derive architecture from module summaries

Given these module summaries, apply the following compression rules:- Collapse modules/classes into capabilities (not names)- Collapse endpoints into use cases- Collapse tables/collections into domain concepts- Collapse integrations into rolesThen produce:1. Architecture style (one of: layered monolith / modular monolith /   microservices / event-driven / serverless / transitional)2. Component diagram (ASCII or Mermaid) with 5–12 components max3. Primary request flows as numbered step sequences4. Key design patterns identified5. Top 5 architectural debt items ranked by risk × effortFormat as an architecture overview document with explicit diagrams.[paste module summaries here]

G.8 - Generate Mermaid sequence diagram

Based on these component summaries, generate a Mermaid sequence diagramfor the [flow name] flow.Include every hop: entry point → middleware → handler → service(s) →data access → external calls → response.For each hop show:- The component name- The operation called- Whether it is synchronous or async (use -->> for async)- Where the consistency boundary begins and ends (add a note)- What happens on failure at this hop (add alt block)[paste component summaries here]

G.9 - Generate component dependency diagram (Mermaid)

Based on these module summaries, generate a Mermaid graph TD componentdiagram showing:1. All components as nodes (5–12 max, capability-named)2. Dependency arrows (A --> B means A depends on B)3. External systems as a different node shape4. Group components by layer using subgraph blocksLabel each arrow with the relationship type:  calls | reads | writes | emits | consumes[paste module summaries here]

Section 3 - Decision & Documentation Prompts

G.10 - Generate an ADR from a known decision

I need to write an Architecture Decision Record for the following decision:Decision context: [describe the situation that forced the decision]What was decided: [state the decision in one sentence]When it was made: [date or approximate period]Constraints at the time: [technical, resource, time, or org constraints]Alternatives that were considered: [list them]Generate a complete ADR using this structure:1. Title (ADR-NNN: [imperative verb + subject])2. Status: Accepted3. Context (2–4 sentences - what was true at the time that forced this decision)4. Decision (1–2 sentences, direct - no hedging)5. Alternatives considered (table: option | why rejected)6. Consequences (positive / negative / risks)7. Review date (suggest a specific date based on the decision type)Write it as if you are the engineer who made the decision, in past tense.Be specific - avoid generic phrases like "to improve performance."

G.11 - Score anti-patterns on the risk × effort matrix

Below is a list of anti-patterns found in a codebase analysis.Score each one on two dimensions and assign a decision.Risk score (1–5):  5 = Data loss, security breach, or production outage likely  4 = Significant user impact or revenue loss possible  3 = Degraded performance or increased incident frequency  2 = Developer friction - slows delivery, doesn't break anything  1 = Cosmetic / style - no functional impactEffort score (1–5):  5 = Architectural overhaul - months  4 = Multi-sprint refactor - weeks, multiple teams  3 = Single-sprint fix - 1–2 weeks, one team  2 = Days of work - targeted, low risk  1 = Hours - config change or single fileDecision rules:  High risk + Low effort   → Fix Now  High risk + Med effort   → Fix Next Sprint  High risk + High effort  → Plan & Track  Med risk  + Low effort   → Fix When Passing  Med risk  + Med effort   → Backlog With Date  Low/Med   + High effort  → Accept & Document  Low risk  + Any effort   → SkipFor each item produce a table row:  Item | Risk (score + one-sentence justification) | Effort (score + justification) | DecisionAnti-patterns to score:[paste your checklist findings here]

G.12 - Translate debt list for CTO / executive audience

Below is a technical architectural debt list from an engineering analysis.Rewrite it for a CTO / VP Engineering audience using these rules:1. Replace every technical term with its business impact   Examples:   - "N+1 query" → "database asks the same question N times under load - causes slowdowns at peak"   - "God service" → "one component doing too many things - a bottleneck and single point of failure"   - "No circuit breaker" → "if payment provider goes down, we go down with it - no automatic fallback"2. Frame each item as:   [What it is in plain language] → [What happens if we don't fix it] → [What fixing it enables]3. Group items into three tiers:   - Immediate risk (production incidents or security exposure)   - Delivery friction (slowing feature development)   - Strategic constraints (limiting future scaling or modernization)4. For each item provide effort in weeks, not story points5. End with:   - Recommended priority order (top 3 with business case)   - Total estimated remediation cost (engineering weeks)   - "If we do nothing" scenario in 12 monthsTechnical debt list:[paste your anti-pattern checklist results here]

G.13 - Generate executive architecture brief (1-pager)

Based on this architecture analysis, produce a 1-page executive brieffor a CTO / VP Engineering audience.Structure it exactly as:1. What this system does (2 sentences, business language only)2. Current architecture (1 sentence naming the style + 1 key diagram reference)3. Current constraints (3 bullets, each: constraint → business impact)4. Options (2–3 options, each: name | effort | risk | what it enables)5. Recommendation (1 option with 2-sentence rationale)6. What we need from you (one specific decision or resource)Rules:- No technical jargon- No code references- Each bullet max 2 lines- Total length: fits on one A4 pageArchitecture analysis input:[paste your architecture doc sections 01, 10, 11 here]

Section 4 - Validation Prompts

G.14 - Validate sequence diagram against logs

I have a documented sequence diagram for [flow name]:[paste your sequence diagram here]And the following log excerpt from a production trace of the same flow:[paste log excerpt here]Compare them. For each hop in the sequence diagram:1. Does it appear in the logs? (yes / no / partially)2. Are there log entries for hops NOT in the diagram?3. Does the ordering match?4. Do the error paths in the diagram match what the logs show?Produce four lists:- Verified steps (diagram matches log)- Discrepancies (diagram says X, log shows Y - explain each)- Missing from diagram (in logs but not documented)- Recommended diagram updates

G.15 - Identify architecture drift between two doc versions

I have two versions of an architecture document.Version A (older - the baseline):[paste older component diagram / capability table here]Version B (current - what we believe is true now):[paste current component diagram / capability table here]Identify:1. Components added (in B, not in A)2. Components removed (in A, not in B)3. Dependency changes (arrows that changed direction or were added/removed)4. Responsibility changes (same component, different scope)5. Architecture style changes (if the overall style shifted)For each change, assess:- Is this intentional (planned evolution) or drift (unplanned accumulation)?- Does it require a new ADR?- Does it require updating sequence diagrams?

G.16 - Review architecture against failure scenario

Here is my architecture documentation for [system name]:[paste component diagram + key sequence diagrams here]Walk through this failure scenario:Scenario: [describe the failure - e.g., "Primary database becomes unavailable for 5 minutes"]For each component in the architecture:1. Is it directly affected? (yes / no / cascades from another)2. What is the user-visible impact?3. Does the architecture documentation show a fallback or circuit breaker?4. Does the documented behavior match what would actually happen?Produce:- Impact map (which components fail, cascade, or survive)- Documentation gaps (what the doc doesn't answer about this scenario)- Recommended additions to sequence diagrams or deployment section

Section 5 - Prompt Chaining Guide

These prompts are designed to be chained. Here is the recommended sequence for a full analysis:

Day 1 - Structure (Pass 1)  G.5 (build file) → G.6 (config) → G.3 (data access) → G.1 (services)Day 2 - Behavior (Pass 2)  G.2 (handlers) → G.4 (integrations) → G.8 (sequence diagrams)Day 3 - Abstraction (Pass 3)  G.7 (derive architecture) → G.9 (component diagram) → G.11 (score debt)Day 4 - Documentation  G.10 (ADRs for top 3 decisions) → G.12 (exec translation) → G.13 (brief)Day 5 - Validation  G.14 (validate sequences vs logs) → G.16 (failure scenarios) → refine

Chaining rule: Always feed the output of one prompt as context into the next. Never start a new analysis prompt from scratch - accumulated context is what makes LLM-assisted architecture work.

Context window management: If your analysis spans more than ~20 modules, split into two chains: one for the service/orchestration layer, one for the data/integration layer. Merge outputs at the G.7 synthesis step.


References

Foundational Architecture Literature

  • Bass, L., Clements, P., & Kazman, R. (2012). Software Architecture in Practice (3rd ed.). Addison-Wesley. - The foundational text on architecture documentation and the ADD (Attribute-Driven Design) method. The quality attribute approach underpins the NFR section of this guide.

  • Fowler, M. (2002). Patterns of Enterprise Application Architecture. Addison-Wesley. - Repository, Service Layer, Data Mapper, and Unit of Work patterns that appear throughout the Java and .NET appendices.

  • Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. - The aggregate root, bounded context, and domain concept vocabulary used in Compression Rule 3 (collapse tables into domain concepts).

  • Newman, S. (2021). Building Microservices (2nd ed.). O'Reilly. - Service decomposition, inter-service contracts, and the distributed systems anti-patterns in the microservices callout.

  • Richardson, C. (2018). Microservices Patterns. Manning. - Saga pattern, circuit breaker, and API Gateway patterns referenced in the microservices and serverless anti-pattern checklists.

Architecture Decision Records

  • Nygard, M. (2011). Documenting Architecture Decisions. cognitect.com/blog/2011/11/15/documenting-architecture-decisions - The original ADR format proposal. The ADR template in this guide is adapted directly from Nygard's structure.

  • Keeling, M. (2017). Design It! From Programmer to Software Architect. Pragmatic Bookshelf. - The "just enough architecture" philosophy that shaped the Minimum Viable Architecture Doc section.

Technical Debt and Code Quality

  • Cunningham, W. (1992). The WyCash Portfolio Management System (OOPSLA '92). - The original technical debt metaphor. The debt register and risk×effort matrix extend this framing into a prioritisation tool.

  • Kerievsky, J. (2004). Refactoring to Patterns. Addison-Wesley. - The pattern recognition approach used in Pass 3 design pattern identification.

  • Feathers, M. (2004). Working Effectively with Legacy Code. Prentice Hall. - The legacy codebase characterization that informed the anti-pattern checklist and the actual-vs-intended architecture gap analysis.

Validation and Observability

  • Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley. - The pipeline and validation gate structure that influenced the 5-gate validation loop.

  • Beyer, B., Jones, C., Petoff, J., & Murphy, N.R. (Eds.) (2016). Site Reliability Engineering. O'Reilly. - The failure scenario validation approach and the "error budget" framing referenced in Gate 4.

Stack-Specific References

Java / Spring / JEE

  • Walls, C. (2022). Spring in Action (6th ed.). Manning.
  • Johnson, R. et al. (2004). Expert One-on-One J2EE Design and Development. Wrox. - Session Facade, Transfer Object, and DAO patterns in Appendix A.

Python / Django / FastAPI

  • Percival, H., & Gregory, B. (2020). Architecture Patterns with Python. O'Reilly. - Repository pattern and dependency injection in Python, directly referenced in Appendix B.

Node.js

  • Young, S. (2018). Node.js Design Patterns (2nd ed.). Packt. - Event-driven patterns and middleware chains referenced in Appendix C.

.NET / C#

  • Microsoft Docs. Application Architecture Guide. learn.microsoft.com - MediatR, CQRS, and Clean Architecture patterns in Appendix D.

Go

  • Butcher, M. (2016). Go Design Patterns. Packt. - Interface-at-point-of-use and dependency inversion patterns in Appendix F.

AI-Assisted Development

  • White, J. et al. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382. - Prompt structuring principles underlying the AI Prompt Library in Appendix G.

  • Khattab, O. et al. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv:2310.03714. - The structured prompt chaining approach referenced in the Section 5 chaining guide.

Tooling Documentation

  • jdeps (Java): docs.oracle.com/en/java/javase/17/docs/specs/man/jdeps.html
  • ArchUnit: archunit.org/userguide/html/000_Index.html
  • jQAssistant: jqassistant.org/get-started
  • pydeps: pydeps.readthedocs.io
  • depcruise: github.com/sverweij/dependency-cruiser
  • madge: github.com/pahen/madge
  • NDepend: ndepend.com/docs
  • Bullet (Rails N+1): github.com/flyerhzm/bullet
  • golangci-lint: golangci-lint.run/usage/quick-start
  • OpenTelemetry: opentelemetry.io/docs
  • AsyncAPI Specification: asyncapi.com/docs/specifications/v3.0.0
  • OpenAPI Specification: spec.openapis.org/oas/v3.1.0

Further Reading on This Blog


v7 - Added: serverless callout (function inventory, state inventory, cold start documentation, execution boundary tracing, 9 serverless anti-patterns); Appendix G - complete AI Prompt Library (16 prompts across 5 categories: analysis, synthesis, decision/documentation, validation, chaining guide); updated navigation table with serverless and prompt library entries; frontmatter, improved title/subtitle/audience, and references section.