You've just been dropped into a codebase you didn't write. This is my field guide for using AI coding agents to do three things fast and safely: understand the system, get productive in it, and extend it without breaking what already works. It is a working playbook, not a vendor pitch.
A principal or senior engineer is hired for leverage, not typing speed. The expensive failure is not "writes code slowly" — it's shipping a change whose blast radius you didn't understand. On a new codebase, you are operating with the least context you will ever have and the highest expectation of impact. AI agents collapse the time it takes to read and navigate code, but they also make it trivially easy to generate plausible changes you don't actually understand. That asymmetry is the whole game.
This paper treats AI coding tools as an accelerant for comprehension and a force-multiplier for safe change — not as an autopilot. The structure is a three-phase arc: build a faithful mental model (MAP), get a fast and trustworthy feedback loop running (RUN), then make small, reversible, well-verified changes (EXTEND). Everything else — tools, prompts, guardrails — hangs off that arc.
Engineers joining an unfamiliar codebase — new job, new team, inherited service, or an acquisition — who already know how to build software and now want a disciplined way to apply AI agents to the onboarding problem. If you're new to vibe coding generally, read the companion note "Vibe Coding Strategy" first; this paper assumes the workflow and focuses on the new-codebase case.
The single most dangerous move on a new codebase is asking an agent to "add feature X" before you can predict where X should live and what it touches. The agent will happily produce code that compiles and even passes tests — while quietly violating an invariant, duplicating an existing abstraction, or bypassing the one validation path everything else relies on. Plausible ≠ correct.
The highest-ROI early use of an agent is explanation and navigation: "trace what happens when a
request hits /checkout," "where is auth enforced," "draw the dependency graph of this
module," "what would break if I changed this signature." You convert hours of grep-and-read into
minutes — then you verify the answer against the source.
Never accept a change you couldn't have described before the agent wrote it. If the agent's output surprises you, that's a signal to slow down and read — not a reason to trust it more because "it ran." Surprise is information.
What "done" means, which constraints are non-negotiable, and what you're explicitly not changing.
Whether a proposed change fits the system's grain — its abstractions, idioms, and unwritten rules.
Reading the diff, exercising the behavior, and confirming nothing downstream silently changed.
Onboarding with AI agents has three phases, run roughly in order but revisited constantly. MAP builds an accurate mental model. RUN gives you a tight, trustworthy feedback loop. EXTEND is where you ship — small, reversible, verified. A feedback arrow runs the whole way back: every change teaches you more about the system, which you fold back into your map and your context files.
| Phase | Question it answers | "Done enough" signal | Biggest risk if skipped |
|---|---|---|---|
| MAP | How does this system actually work? | You can sketch the architecture and predict where a new feature lives. | You build in the wrong place or duplicate an abstraction. |
| RUN | Can I change it and instantly know if I broke it? | App runs locally; full test/lint/type loop is green and fast. | You ship blind — no signal between "edited" and "broke prod." |
| EXTEND | How do I add value without regressions? | Feature merged behind tests; blast radius known and contained. | Silent regressions; reviewers can't follow a giant PR. |
Understanding a codebase is layered. You can't reason about domain boundaries before you can run the thing, and you can't reason about the "why" before you understand the flow. Work bottom-up through the comprehension stack, using the agent to compress each layer — then verify against source.
For each layer, run a tight loop: ask → locate → read → verify → record. The agent does the asking and locating; you do the reading and verifying; the recording feeds your context files (§7).
POST /orders from entry to DB write. List every function with file:line."billing/. Which modules does it import, and who imports it?"Agents hallucinate file paths, invent functions, and over-confidently summarize. Treat every claim as a hypothesis with a citation:
Chesterton's Fence: before you let an agent delete or "simplify" something that looks pointless, find out why it exists. Odd code is often a scar from a real bug.
| Job | Tool / technique | Why it beats reading line by line |
|---|---|---|
| Whole-repo Q&A | Agentic IDE/CLI (Claude Code, Cursor, Windsurf, Cody) | Reads & summarizes across files in seconds; answers "where / how / why" with citations. |
| Semantic code search | Sourcegraph, ripgrep (rg), Grep MCP, LSP "find references" | Ground-truth navigation — verifies the agent and finds all call sites, not just likely ones. |
| Dependency / import graph | madge, dependency-cruiser, go mod graph, pydeps, jdeps | Reveals module seams and cycles the prose summary will miss. |
| Repo map for the agent | Aider repo-map, ctags, tree-sitter symbol index | Gives the agent a compressed skeleton so it reasons about the whole repo, not just open files. |
| Architecture sketch | Mermaid/PlantUML generated by the agent, then hand-corrected | Forces the model out of your head and onto a diagram you can falsify. |
| History & intent | git log/blame, PR history, ADRs, CODEOWNERS | The "why" lives in history and ownership, not in the code itself. |
| Runtime truth | Debugger, request tracing, logs, strace/profilers | What actually executes > what you think executes. Settles every disagreement. |
The test suite is the cheapest executable specification you'll ever get. It tells you what
behavior the team considers important, what the public contracts are, and how to call things correctly. Ask
the agent: "Summarize what the tests in /tests assert about this module's behavior, grouped by feature."
You cannot safely change what you cannot run. The goal of this phase is a fast, trustworthy feedback loop: a local environment that boots, a green test/lint/type baseline, and observability so you can see what the code does at runtime. This is the safety net that makes EXTEND non-scary. Getting the app running with green tests is your true Day-1 milestone — not your first feature.
make setup, nix, or the documented script.main, that's finding #1 — you need a green baseline or you can't tell your breakage from pre-existing breakage.An agent that can run your tests and read your logs can self-correct; one that can't will guess. This is the single biggest reliability lever in the RUN phase.
Tell the agent the exact build/test/lint/run commands. Put them in CLAUDE.md/AGENTS.md so it never guesses.
Let it run tests and read failures, tail logs, hit endpoints. MCP servers for git, the DB, Sentry, and the browser turn "I think" into "I checked."
A disposable branch, a scratch DB, and permission scoping so an over-eager agent can't touch prod, secrets, or shared state.
Before you change anything, ask the agent and the team: "What parts of this system are fragile, undertested, or 'don't touch'? Where are the migrations, the money paths, the auth, the cron jobs, and the external integrations?" Knowing where the landmines are is half of not stepping on them.
# A minimal CLAUDE.md / AGENTS.md so the agent works *with* your loop ## Commands build: pnpm build test: pnpm test # fast subset: pnpm test:unit lint: pnpm lint && pnpm typecheck run: pnpm dev # app on :3000, requires .env.local ## Conventions - Always run lint + typecheck + test before declaring a change done. - Money/auth code lives in src/billing and src/auth — propose, don't auto-edit. - Never edit db/migrations/* that are already applied; add a new migration. ## Don't touch - infra/, .github/workflows/, anything reading PROD_* env vars.
Now you ship. The discipline here is simple to state and hard to keep: small, reversible, verified changes that respect the system's grain. The agent makes generation fast, which makes it tempting to take big swings — resist. Big AI-authored PRs are where regressions hide and reviewers rubber-stamp.
If the code you're about to touch is undertested, write tests that capture current behavior before changing anything — even behavior you suspect is wrong. Now any change that alters behavior shows up as a failing test. This is Michael Feathers' core technique, and it's the thing that makes AI-assisted refactoring of legacy code safe.
Great agent use: "Write characterization tests that capture the current
behavior of computeDiscount(), including edge cases, without judging whether it's correct."
Your first feature change should be a thin end-to-end slice that touches every layer the way a real feature would — UI/API → service → data → test — but does the minimum. It proves you understand the wiring and exercises your whole RUN loop. Then thicken it.
git bisect and revert stay cheap.The fastest way to get a PR rejected — or to rot a codebase — is to introduce a pattern the team doesn't use. Before generating, tell the agent the local conventions and point it at an exemplar:
# Anchor the agent to existing patterns instead of its training-data defaults "Add an endpoint to cancel an order. Follow the exact pattern in src/api/orders/refund.ts — same validation, error handling, and test layout. Use the existing OrderService; do not add a new data-access layer. Show me a plan first; don't edit until I approve."
For anything non-trivial, make the agent propose a numbered plan and the files it will touch before it writes code. You catch wrong-direction changes when they're a sentence to fix, not a 300-line diff to untangle. If you can't predict the outcome of each step, the plan isn't ready.
To replace a subsystem, build the new path beside the old one and route traffic over incrementally, deleting the old only once nothing calls it. Agents excel at the mechanical "build the parallel path and migrate callers" work — you own the routing decisions.
For a big refactor, attempt the change, note what breaks, revert, fix the prerequisites first, repeat. The agent can rapidly explore "what breaks if I do X" on a throwaway branch and report the dependency tree before you commit to an order.
On a new codebase your agent is as ignorant as you are — worse, it doesn't know it's ignorant. The leverage is in feeding it the right context: the conventions, commands, danger zones, and exemplars that aren't obvious from any single file. This is the highest-ROI investment of your first week, and it compounds: everything you learn in MAP and EXTEND flows back into these files.
Most agentic tools read a project rules file automatically: CLAUDE.md (Claude Code),
AGENTS.md (an emerging cross-tool standard), .cursor/rules (Cursor),
.windsurfrules, Copilot instructions. Put in them what a new senior hire would need on day one:
The Model Context Protocol (MCP) lets agents pull live context and act through standardized servers instead of guessing. The high-value ones for onboarding:
More context isn't better — relevant context is. Dumping the whole repo into the window buries the signal and invites confident nonsense. Curate: the few files that matter, the exact conventions, one good exemplar. Use subagents or fresh sessions for independent sub-questions so one investigation doesn't pollute the next.
The tools cluster by autonomy (how much they do without you) and scope (a single edit vs. the whole repo). For onboarding, you mostly want the high-context, human-in-the-loop quadrant: tools that can read the whole repo and explain it, while keeping you in control of every change.
| When you need to… | Reach for | Note for a new codebase |
|---|---|---|
| Understand & navigate a big repo | Claude Code, Cursor, Cody, Windsurf | Agentic context + citations; ask for file:line and verify. |
| Search a very large / monorepo precisely | Sourcegraph, ripgrep, Grep MCP | Ground truth that keeps the agent honest. |
| Make a focused multi-file change you'll review | Aider, Claude Code, Cursor agent | Plan → approve → diff. Best for EXTEND. |
| Stay in your editor for small edits | Copilot, Cursor tab, JetBrains AI | Fine once you understand the area; weak for first-contact comprehension. |
| Offload a well-specified, low-risk task | Devin, Codex (cloud), Copilot agent, Junie | Only after MAP + RUN exist and the task is boxed and verifiable. |
| Prototype a fresh UI / spike | v0, bolt.new, Lovable | Greenfield only — not for integrating into an existing system. |
Tooling churn is real. Names, modes, and leaders shift every few months — treat the specifics above as a snapshot, not gospel. The categories and the selection logic (context depth × autonomy × your verification capacity) are what stay stable. Pick for the job, not the hype.
Every AI-generated change deserves the same question: do I trust this enough to land it? The answer is a function of how well you understand it, whether the safety net would catch a regression, and how big the blast radius is. This flow is the gate.
If you can't explain the change in plain English, you can't own it. Read the diff or ask the agent to justify each hunk — then decide.
Tests, types, and CI are what let you move fast. If a regression wouldn't be caught, your job is to make it catchable before landing.
Small + reversible = safe to ship and learn. Wide or irreversible = flag it, stage it, or split it until it isn't.
Every one of these traces back to the same root — the agent's fluency reads as competence, so you stop checking. A confident, well-formatted, runnable answer feels like a correct one. On a codebase you don't yet understand, that feeling is exactly the thing to distrust.
A concrete cadence for a principal joining a new codebase. Each horizon has one headline outcome.
CLAUDE.md/AGENTS.md.Print this. Tick it as you go. It's the whole paper compressed into actions.
CLAUDE.md/AGENTS.md with commands & conventionsUse the agent to understand faster and to change safer — never to skip understanding or skip verification. Comprehension, judgment, and the diff review stay yours. That's the line between a principal who ships confidently on a new codebase and one who becomes its next incident.
A working bibliography behind this playbook — the ideas it leans on and the tools it references. Tooling docs move fast; treat versioned specifics as a 2026 snapshot.
AGENTS.md, .cursor/rules) and repo-map behavior.