Principal engineer's playbook · May 2026

Vibe Coding: Tools & Strategies to Work With a New Codebase

You've just been dropped into a codebase you didn't write. This is my field guide for using AI coding agents to do three things fast and safely: understand the system, get productive in it, and extend it without breaking what already works. It is a working playbook, not a vendor pitch.

The core bet of this playbook: an agent that can read 100,000 lines in seconds does not give you understanding — it gives you a faster path to it. Comprehension, judgment, and verification stay yours. The tools just shorten the loop.
Scope: day 0 → shipping features Stance: comprehension-first, tool-agnostic Horizon: 2026 tooling ✍️ By: Linh Truong

1 · The principal's problem

A principal or senior engineer is hired for leverage, not typing speed. The expensive failure is not "writes code slowly" — it's shipping a change whose blast radius you didn't understand. On a new codebase, you are operating with the least context you will ever have and the highest expectation of impact. AI agents collapse the time it takes to read and navigate code, but they also make it trivially easy to generate plausible changes you don't actually understand. That asymmetry is the whole game.

This paper treats AI coding tools as an accelerant for comprehension and a force-multiplier for safe change — not as an autopilot. The structure is a three-phase arc: build a faithful mental model (MAP), get a fast and trustworthy feedback loop running (RUN), then make small, reversible, well-verified changes (EXTEND). Everything else — tools, prompts, guardrails — hangs off that arc.

Day 1running app + green tests is the real first milestone
90%of onboarding risk lives in what you don't read
<200lines per change you can actually review well
2 loopscomprehension loop + safe-change loop are the whole method

Who this is for

Engineers joining an unfamiliar codebase — new job, new team, inherited service, or an acquisition — who already know how to build software and now want a disciplined way to apply AI agents to the onboarding problem. If you're new to vibe coding generally, read the companion note "Vibe Coding Strategy" first; this paper assumes the workflow and focuses on the new-codebase case.

2 · Thesis — comprehension before generation

The trap

Generation is cheap; understanding is not

The single most dangerous move on a new codebase is asking an agent to "add feature X" before you can predict where X should live and what it touches. The agent will happily produce code that compiles and even passes tests — while quietly violating an invariant, duplicating an existing abstraction, or bypassing the one validation path everything else relies on. Plausible ≠ correct.

The reframe

Use the agent to read, not just to write

The highest-ROI early use of an agent is explanation and navigation: "trace what happens when a request hits /checkout," "where is auth enforced," "draw the dependency graph of this module," "what would break if I changed this signature." You convert hours of grep-and-read into minutes — then you verify the answer against the source.

The one rule that prevents most disasters

Never accept a change you couldn't have described before the agent wrote it. If the agent's output surprises you, that's a signal to slow down and read — not a reason to trust it more because "it ran." Surprise is information.

Three things that stay human

Intent

What "done" means, which constraints are non-negotiable, and what you're explicitly not changing.

Judgment

Whether a proposed change fits the system's grain — its abstractions, idioms, and unwritten rules.

Verification

Reading the diff, exercising the behavior, and confirming nothing downstream silently changed.

3 · The MAP → RUN → EXTEND framework

Onboarding with AI agents has three phases, run roughly in order but revisited constantly. MAP builds an accurate mental model. RUN gives you a tight, trustworthy feedback loop. EXTEND is where you ship — small, reversible, verified. A feedback arrow runs the whole way back: every change teaches you more about the system, which you fold back into your map and your context files.

1 · MAP Build the mental model Repo topology · build & run data & control flow domain boundaries · the "why" Output: a map you can defend 2 · RUN Hit the ground running Reproducible env · build green tests + lint + types pass fast inner loop · observability Output: a trustworthy safety net 3 · EXTEND Change without breaking Tracer change · small slices characterization tests first verify diff · one commit / idea Output: shipped, reversible value Feedback: every change updates your map & context files Sequential to start · then continuously revisited
Figure 1 — The onboarding arc. MAP and RUN are prerequisites for safe EXTEND; the loop never fully closes.
PhaseQuestion it answers"Done enough" signalBiggest risk if skipped
MAPHow does this system actually work?You can sketch the architecture and predict where a new feature lives.You build in the wrong place or duplicate an abstraction.
RUNCan I change it and instantly know if I broke it?App runs locally; full test/lint/type loop is green and fast.You ship blind — no signal between "edited" and "broke prod."
EXTENDHow do I add value without regressions?Feature merged behind tests; blast radius known and contained.Silent regressions; reviewers can't follow a giant PR.

4 · Phase 1 · MAP — build the mental model

Understanding a codebase is layered. You can't reason about domain boundaries before you can run the thing, and you can't reason about the "why" before you understand the flow. Work bottom-up through the comprehension stack, using the agent to compress each layer — then verify against source.

1 · Topology — what's here repo layout · languages · entry points · services · build tooling · README/docs 2 · Run & build — how it boots install · env vars · how to start it · how tests run · CI pipeline 3 · Flow — how data moves request lifecycle · key call paths · state & persistence · async/queues 4 · Domain — the model & boundaries core entities · module seams · who owns what · public contracts 5 · The "why" — intent & history decisions, ADRs, git history, Chesterton's fences depth of understanding ↑
Figure 2 — The comprehension stack. Agents accelerate every layer, but the higher you go, the more verification matters.

The reconnaissance loop

For each layer, run a tight loop: ask → locate → read → verify → record. The agent does the asking and locating; you do the reading and verifying; the recording feeds your context files (§7).

Prompts that map a repo fast
  • "Give me a tour of this repo: top-level structure, what each top-level dir is for, and the main entry points. Cite file paths."
  • "What are the 10 files I should read first to understand this system, and why each?"
  • "Trace the lifecycle of a request to POST /orders from entry to DB write. List every function with file:line."
  • "Where is authentication/authorization enforced? Show me every place a request can bypass it."
  • "Diagram the module dependencies for billing/. Which modules does it import, and who imports it?"
  • "What are the core domain entities and their relationships? Where are they defined?"
Verify the agent, always

Agents hallucinate file paths, invent functions, and over-confidently summarize. Treat every claim as a hypothesis with a citation:

  • Open the cited file — does the function exist and do what's claimed?
  • Cross-check with a real tool — grep / language-server "find references" beats vibes for "where is X used."
  • Ask for file:line citations and reject summaries without them.
  • Run it — the fastest correctness check is exercising the path, not re-reading the explanation.

Chesterton's Fence: before you let an agent delete or "simplify" something that looks pointless, find out why it exists. Odd code is often a scar from a real bug.

Tools that build the map

JobTool / techniqueWhy it beats reading line by line
Whole-repo Q&AAgentic IDE/CLI (Claude Code, Cursor, Windsurf, Cody)Reads & summarizes across files in seconds; answers "where / how / why" with citations.
Semantic code searchSourcegraph, ripgrep (rg), Grep MCP, LSP "find references"Ground-truth navigation — verifies the agent and finds all call sites, not just likely ones.
Dependency / import graphmadge, dependency-cruiser, go mod graph, pydeps, jdepsReveals module seams and cycles the prose summary will miss.
Repo map for the agentAider repo-map, ctags, tree-sitter symbol indexGives the agent a compressed skeleton so it reasons about the whole repo, not just open files.
Architecture sketchMermaid/PlantUML generated by the agent, then hand-correctedForces the model out of your head and onto a diagram you can falsify.
History & intentgit log/blame, PR history, ADRs, CODEOWNERSThe "why" lives in history and ownership, not in the code itself.
Runtime truthDebugger, request tracing, logs, strace/profilersWhat actually executes > what you think executes. Settles every disagreement.

Read the tests first

The test suite is the cheapest executable specification you'll ever get. It tells you what behavior the team considers important, what the public contracts are, and how to call things correctly. Ask the agent: "Summarize what the tests in /tests assert about this module's behavior, grouped by feature."

5 · Phase 2 · RUN — hit the ground running

You cannot safely change what you cannot run. The goal of this phase is a fast, trustworthy feedback loop: a local environment that boots, a green test/lint/type baseline, and observability so you can see what the code does at runtime. This is the safety net that makes EXTEND non-scary. Getting the app running with green tests is your true Day-1 milestone — not your first feature.

Reproducible environment

Boot it, the boring way

  • Follow the README literally; when it fails (it will), feed the exact error to the agent: "This setup step failed with X — what's the fix, given this repo's config?"
  • Prefer the project's own path: devcontainer, Docker Compose, make setup, nix, or the documented script.
  • Capture every fix you discover into the setup docs / your context file — the next person (and the agent) needs it.
  • Confirm you can reach a working state: app starts, a request succeeds, the test suite runs.
The green baseline

Establish "known good" before touching anything

  • Run the full test suite on a clean checkout. Record what passes, what's flaky, what's skipped.
  • Run the linter, formatter, and type checker. Note the project's exact commands.
  • If tests are red on main, that's finding #1 — you need a green baseline or you can't tell your breakage from pre-existing breakage.
  • Time the loop. A 40-minute test run will quietly push you toward skipping verification — find the fast subset.

Wire the agent into your loop

An agent that can run your tests and read your logs can self-correct; one that can't will guess. This is the single biggest reliability lever in the RUN phase.

Give it commands

Tell the agent the exact build/test/lint/run commands. Put them in CLAUDE.md/AGENTS.md so it never guesses.

Give it eyes

Let it run tests and read failures, tail logs, hit endpoints. MCP servers for git, the DB, Sentry, and the browser turn "I think" into "I checked."

Give it a sandbox

A disposable branch, a scratch DB, and permission scoping so an over-eager agent can't touch prod, secrets, or shared state.

Map the danger zones early

Before you change anything, ask the agent and the team: "What parts of this system are fragile, undertested, or 'don't touch'? Where are the migrations, the money paths, the auth, the cron jobs, and the external integrations?" Knowing where the landmines are is half of not stepping on them.

# A minimal CLAUDE.md / AGENTS.md so the agent works *with* your loop
## Commands
build:   pnpm build
test:    pnpm test          # fast subset: pnpm test:unit
lint:    pnpm lint && pnpm typecheck
run:     pnpm dev             # app on :3000, requires .env.local

## Conventions
- Always run lint + typecheck + test before declaring a change done.
- Money/auth code lives in src/billing and src/auth — propose, don't auto-edit.
- Never edit db/migrations/* that are already applied; add a new migration.

## Don't touch
- infra/, .github/workflows/, anything reading PROD_* env vars.

6 · Phase 3 · EXTEND — change without breaking

Now you ship. The discipline here is simple to state and hard to keep: small, reversible, verified changes that respect the system's grain. The agent makes generation fast, which makes it tempting to take big swings — resist. Big AI-authored PRs are where regressions hide and reviewers rubber-stamp.

The safe-change loop

1 · Pin behavior characterization / golden-master tests 2 · Smallest slice one vertical, reversible change · flag if risky 3 · Generate agent edits + runs tests/build 4 · Verify read diff · run app edge cases · existing tests still green 5 · Land one commit / idea why in the message repeat per slice — never one giant PR Guardrails wrap every step: version control · CI · feature flags · code review · scoped agent permissions · easy rollback
Figure 3 — The safe-change loop. Pinning behavior first is what lets you refactor and extend legacy code without fear.
Pin behavior first (legacy code)

Characterization tests = a seatbelt

If the code you're about to touch is undertested, write tests that capture current behavior before changing anything — even behavior you suspect is wrong. Now any change that alters behavior shows up as a failing test. This is Michael Feathers' core technique, and it's the thing that makes AI-assisted refactoring of legacy code safe.

Great agent use: "Write characterization tests that capture the current behavior of computeDiscount(), including edge cases, without judging whether it's correct."

Work in vertical slices

The tracer-bullet first change

Your first feature change should be a thin end-to-end slice that touches every layer the way a real feature would — UI/API → service → data → test — but does the minimum. It proves you understand the wiring and exercises your whole RUN loop. Then thicken it.

  • Keep PRs reviewable: aim for changes a human can read in one sitting.
  • One commit per logical idea so git bisect and revert stay cheap.
  • Risky or wide-reaching change? Put it behind a flag and roll out gradually.

Respect the grain

The fastest way to get a PR rejected — or to rot a codebase — is to introduce a pattern the team doesn't use. Before generating, tell the agent the local conventions and point it at an exemplar:

# Anchor the agent to existing patterns instead of its training-data defaults
"Add an endpoint to cancel an order. Follow the exact pattern in
 src/api/orders/refund.ts — same validation, error handling, and test layout.
 Use the existing OrderService; do not add a new data-access layer.
 Show me a plan first; don't edit until I approve."

Plan → approve → generate

For anything non-trivial, make the agent propose a numbered plan and the files it will touch before it writes code. You catch wrong-direction changes when they're a sentence to fix, not a 300-line diff to untangle. If you can't predict the outcome of each step, the plan isn't ready.

Refactoring at scale: the Mikado & Strangler patterns

Strangler Fig

To replace a subsystem, build the new path beside the old one and route traffic over incrementally, deleting the old only once nothing calls it. Agents excel at the mechanical "build the parallel path and migrate callers" work — you own the routing decisions.

Mikado Method

For a big refactor, attempt the change, note what breaks, revert, fix the prerequisites first, repeat. The agent can rapidly explore "what breaks if I do X" on a throwaway branch and report the dependency tree before you commit to an order.

7 · Context engineering for a new repo

On a new codebase your agent is as ignorant as you are — worse, it doesn't know it's ignorant. The leverage is in feeding it the right context: the conventions, commands, danger zones, and exemplars that aren't obvious from any single file. This is the highest-ROI investment of your first week, and it compounds: everything you learn in MAP and EXTEND flows back into these files.

Persistent context files

Write the repo's missing manual

Most agentic tools read a project rules file automatically: CLAUDE.md (Claude Code), AGENTS.md (an emerging cross-tool standard), .cursor/rules (Cursor), .windsurfrules, Copilot instructions. Put in them what a new senior hire would need on day one:

  • Exact build/test/lint/run commands.
  • Architectural shape and where things go.
  • Conventions, idioms, and "we don't do X here."
  • Danger zones and "don't touch" paths.
  • Links to ADRs, runbooks, and the deployment story.
Live context via MCP

Connect the agent to ground truth

The Model Context Protocol (MCP) lets agents pull live context and act through standardized servers instead of guessing. The high-value ones for onboarding:

  • Git/GitHub — history, PRs, issues, blame, ownership.
  • Code search (Sourcegraph/Grep) — verified navigation across the whole repo.
  • Database — real schema instead of an inferred one.
  • Observability (Sentry, logs, traces) — what actually breaks in prod.
  • Browser/preview — let the agent see the UI it's changing.
  • Issue trackers / docs (Jira, Linear, Notion) — the intent behind the work.

Context budget: relevant beats voluminous

More context isn't better — relevant context is. Dumping the whole repo into the window buries the signal and invites confident nonsense. Curate: the few files that matter, the exact conventions, one good exemplar. Use subagents or fresh sessions for independent sub-questions so one investigation doesn't pollute the next.

8 · Tool landscape (2026)

The tools cluster by autonomy (how much they do without you) and scope (a single edit vs. the whole repo). For onboarding, you mostly want the high-context, human-in-the-loop quadrant: tools that can read the whole repo and explain it, while keeping you in control of every change.

Autonomy ↑ Scope: single edit → whole repo / multi-file → high autonomy human-in-loop onboarding sweet spot — high context, you steer Copilot / Cursor tab (autocomplete) JetBrains AI · Gemini Code Assist · Amazon Q Claude Code · Cursor (agent) · Windsurf Aider · Cline · Continue (open-source CLIs) Sourcegraph Cody / Amp (code-search native) Devin · OpenAI Codex (cloud) · Copilot agent JetBrains Junie · Replit Agent v0 · bolt.new · Lovable (greenfield UI)
Figure 4 — Indicative positioning, 2026. For a brand-new codebase, favor the green band: whole-repo context with you approving each change. Reach for autonomous agents only once your map and safety net exist.
Inline / chat assist Agentic, human-in-loop Autonomous agents Greenfield / prototyping

Picking tools by job

When you need to…Reach forNote for a new codebase
Understand & navigate a big repoClaude Code, Cursor, Cody, WindsurfAgentic context + citations; ask for file:line and verify.
Search a very large / monorepo preciselySourcegraph, ripgrep, Grep MCPGround truth that keeps the agent honest.
Make a focused multi-file change you'll reviewAider, Claude Code, Cursor agentPlan → approve → diff. Best for EXTEND.
Stay in your editor for small editsCopilot, Cursor tab, JetBrains AIFine once you understand the area; weak for first-contact comprehension.
Offload a well-specified, low-risk taskDevin, Codex (cloud), Copilot agent, JunieOnly after MAP + RUN exist and the task is boxed and verifiable.
Prototype a fresh UI / spikev0, bolt.new, LovableGreenfield only — not for integrating into an existing system.

Tooling churn is real. Names, modes, and leaders shift every few months — treat the specifics above as a snapshot, not gospel. The categories and the selection logic (context depth × autonomy × your verification capacity) are what stay stable. Pick for the job, not the hype.

9 · The "don't break anything" decision flow

Every AI-generated change deserves the same question: do I trust this enough to land it? The answer is a function of how well you understand it, whether the safety net would catch a regression, and how big the blast radius is. This flow is the gate.

Agent proposes a change Can I explain what it does & why? Read the diff / ask why don't land it yet Would a regression be caught by tests? Add characterization tests, then re-run Blast radius small & reversible? Split / flag / stage it; shrink the change Land it — one commit, why in the message NO YES NO YES NO YES
Figure 5 — The gate before every merge. Three NOs to clear: understanding, safety net, blast radius. Any NO sends you back, not forward.
Understanding

If you can't explain the change in plain English, you can't own it. Read the diff or ask the agent to justify each hunk — then decide.

Safety net

Tests, types, and CI are what let you move fast. If a regression wouldn't be caught, your job is to make it catchable before landing.

Blast radius

Small + reversible = safe to ship and learn. Wide or irreversible = flag it, stage it, or split it until it isn't.

10 · Anti-patterns & failure modes

What goes wrong
  • Generate-before-understand: shipping a feature before you can predict where it lives.
  • Trusting the summary: believing the agent's prose over the actual source.
  • Hallucinated APIs: accepting calls to functions/endpoints that don't exist.
  • Mega-PR: one 2,000-line agent diff no human can truly review.
  • Convention drift: introducing patterns the team doesn't use, rotting consistency.
  • Deleting Chesterton's fences: "simplifying" code whose purpose you never learned.
  • No safety net: changing untested code with no way to detect regressions.
  • Autonomy too early: turning a cloud agent loose before MAP + RUN exist.
The antidote
  • Predict the location/blast radius before prompting.
  • Verify every claim against source; demand file:line citations.
  • Cross-check "where used" with grep / find-references.
  • Small slices; one commit per idea; reviewable diffs.
  • Point the agent at an exemplar and the rules file.
  • Ask "why does this exist?" before removing anything.
  • Pin behavior with characterization tests first.
  • Earn autonomy: box the task, ensure it's verifiable, then delegate.

The meta-failure: false confidence

Every one of these traces back to the same root — the agent's fluency reads as competence, so you stop checking. A confident, well-formatted, runnable answer feels like a correct one. On a codebase you don't yet understand, that feeling is exactly the thing to distrust.

11 · First day / week / month / quarter

A concrete cadence for a principal joining a new codebase. Each horizon has one headline outcome.

Day 1 App runs locally; tests green Week 1 Architecture map + first tracer change merged Month 1 Real feature shipped; context files seeded Quarter 1 Trusted to lead changes; improving the system & loop
Figure 6 — Outcome-based horizons. The dates flex; the order doesn't.
D1
Day one — get it running. Clone, set up the env, boot the app, run the full test suite. Establish the green baseline and the exact commands. Skim the README, top-level layout, and the test directory. Outcome: you can change a line and see it reflected, with tests as your tripwire.
W1
Week one — map it and land a tracer. Use the agent to trace 2–3 core flows and draw the architecture; verify against source. Identify danger zones and owners. Ship one trivial, end-to-end change to prove the loop and meet code review. Start the CLAUDE.md/AGENTS.md.
M1
Month one — ship something real. Take a genuine feature or fix through the full safe-change loop. Pin behavior with tests where coverage is thin. Fold everything you learned into the context files. You should now predict blast radius reliably for the areas you've touched.
Q1
Quarter one — earn leverage. You're trusted to lead non-trivial changes and to say "no, not there." Start improving the system itself: the test loop's speed, the onboarding docs, the danger zones. Mentor others on the comprehension-first method.

12 · The principal's onboarding checklist

Print this. Tick it as you go. It's the whole paper compressed into actions.

MAP — understand

  • Toured the repo; know each top-level dir's purpose
  • Identified entry points & the 10 files to read first
  • Traced 2–3 core request/data flows, verified in source
  • Located where auth, money, and migrations live
  • Drew (and corrected) an architecture diagram
  • Read the tests as the executable spec
  • Skimmed git history / ADRs for the "why"
  • Asked the team where the bodies are buried

RUN — get productive

  • App boots locally via the documented path
  • Full test suite runs; recorded pass/flaky/skip
  • Lint, format, type-check commands known & green
  • Found the fast inner-loop test subset
  • Wired the agent to run tests & read logs
  • Set up a sandbox branch + scoped permissions
  • Started CLAUDE.md/AGENTS.md with commands & conventions
  • Connected useful MCP servers (git, search, DB, errors)

EXTEND — ship safely

  • Pinned behavior with characterization tests where thin
  • Made the agent plan before it edited
  • Pointed it at an exemplar + the rules file
  • Kept the change a small, reversible slice
  • Read every line of the diff
  • Ran the app + edge cases, not just unit tests
  • One commit per idea; the "why" in the message
  • Flagged / staged anything with wide blast radius

Always-on guardrails

  • Everything in version control before agent edits
  • Never accept a change you can't explain
  • Verify claims against source & real tools
  • Treat surprise as a signal to slow down
  • Ask "why does this exist?" before deleting
  • Earn autonomy; don't grant it by default
  • Feed every lesson back into context files
  • Keep a clean rollback path at all times

If you remember only one thing

Use the agent to understand faster and to change safer — never to skip understanding or skip verification. Comprehension, judgment, and the diff review stay yours. That's the line between a principal who ships confidently on a new codebase and one who becomes its next incident.

13 · References & further reading

A working bibliography behind this playbook — the ideas it leans on and the tools it references. Tooling docs move fast; treat versioned specifics as a 2026 snapshot.

Foundational ideas

  1. Karpathy, A. — coined "vibe coding," February 2025 (on X/Twitter). The origin of the term and the spectrum it describes.
  2. Feathers, M.Working Effectively with Legacy Code (2004). Characterization tests, seams, and changing code you don't fully understand without breaking it.
  3. Fowler, M. — "StranglerFigApplication" and "Tracer Bullets" patterns, martinfowler.com. Incremental replacement and end-to-end thin slices.
  4. Hunt & ThomasThe Pragmatic Programmer. Tracer bullets, Chesterton's Fence as engineering discipline.
  5. Brandolini, A. / Evans, E. — Domain-Driven Design & EventStorming: finding domain boundaries and the model in an unfamiliar system.
  6. Ellnestam & BrolundThe Mikado Method (2014). Disciplined large-scale refactoring via revert-and-prerequisite exploration.

Tools & protocols (2026)

  1. Anthropic — Claude Code docs and the Model Context Protocol (MCP) specification, modelcontextprotocol.io.
  2. Agentic IDEs/CLIs — Cursor, Windsurf (Codeium), Aider, Cline, Continue, Sourcegraph Cody/Amp. Project docs for rules files (AGENTS.md, .cursor/rules) and repo-map behavior.
  3. GitHub — Copilot (autocomplete, chat, agent mode) and Copilot coding agent documentation.
  4. Autonomous agents — Cognition Devin, OpenAI Codex (cloud), JetBrains Junie, Replit Agent, Amazon Q Developer, Google Gemini Code Assist.
  5. Navigation & analysis — Sourcegraph code search, ripgrep, madge / dependency-cruiser, tree-sitter, universal-ctags, Language Server Protocol "find references."
  6. Companion note — Truong, L., "Vibe Coding Strategy" (this paper's general-purpose sibling on working with AI agents in production).