Claude Code, for the pros

Why context engineering is the actual skill, and what the new primitives look like when you sit down with someone from Anthropic for an afternoon.

May 18, 2026

I spent an afternoon in a Claude Code session with someone from Anthropic. Twenty slides in I realised I wasn’t watching a tooling demo, I was watching a quiet redefinition of what “using an LLM well” even means.

The headline isn’t a new model. It’s a stack of primitives — Skills, CLAUDE.md hierarchies, MCP, Hooks, Plugins, subagents, agent teams, parallel Claude, verification loops — that collectively answer one question:

Once the model is good enough, what’s actually left for the human to do?

The answer, as far as I can tell after this session, is context engineering. Not prompting. Not “knowing the magic words.” Engineering the information environment the model operates in — what it sees, what it doesn’t, when it sees it, who else is looking, and what gets to count as “done.”

That’s the thread I want to pull on. Here’s what I took away, organised the way I now actually think about it.

1. The shift: from prompting to context engineering

The old mental model was: write a clever prompt, get a clever answer. Iterate.

The new model is closer to: design the context window. The prompt is a small part of it. The much bigger part is what Anthropic frames as four context management strategies:

Actively manage your context — visualize it (/context), prune it (/compact), don’t let it accrete junk.
Isolate distinct tasks by session — /clear aggressively, or spin up parallel Claudes for separate concerns.
Shrink your search space — point Claude at exactly the files it needs (CLAUDE.md, @-mentions) instead of letting it grep the whole repo.
Progressively disclose context — load capabilities only when needed (this is the whole Skills design philosophy).

If you internalise nothing else from this post, internalise that list. Every other primitive below is in service of one of those four moves.

For anyone working in multilingual AI, this maps onto territory I think about constantly with Black Ice: the unit of correctness isn’t the string the model produces, it’s the structured context the model was operating inside. Context engineering for code is the same discipline, with different vocabulary.

2. CLAUDE.md: instructions as a hierarchy, not a prompt

CLAUDE.md is a markdown file that lives in your repo (and ~/.claude/CLAUDE.md lives at user level). Claude reads it on startup. That’s it. That’s the feature.

What makes it interesting is the hierarchy:

~/.claude/CLAUDE.md — your personal preferences, your global thinking instructions
Monorepo/CLAUDE.md — system overview, modernization strategy, team structure
Repo/CLAUDE.md — component architecture, technical debt, migration plans
Submodule/frontend/CLAUDE.md, Submodule/backend/CLAUDE.md, Submodule/test-cases/CLAUDE.md — local specifics

Claude assembles the right slice based on where you’re working. You stop repeating “use pnpm, not npm. Run tests with pytest. Follow PEP8” in every session, because it’s already loaded.

The mental shift: persistent instructions belong in version control, not in your prompt history. Your prompts get shorter. Your repo gets smarter.

3. Skills: progressive disclosure as a design principle

This was the moment I sat up.

A Skill is an organised folder containing a SKILL.md plus whatever else the skill needs — reference docs, executable scripts, templates, examples.

The trick is progressive disclosure:

In plain English: Claude has hundreds of latent capabilities sitting on disk. It pays a tiny token cost (~100 tokens of metadata) per Skill to know they exist. It only pays the full cost of reading the body when the skill is actually relevant to the current task. It only pays the cost of bundled assets when it specifically needs them.

This is the architecture you’d design if you took context windows seriously as a constraint and you wanted the agent’s effective competence to be unbounded. Two types of Skills they highlighted:

General capabilities Claude isn’t good at out of the box (yet) — e.g. creating PDFs, Excel, PowerPoint files.
Knowledge of an organisation’s workflows and best practices — e.g. Anthropic’s own brand styling, internal coding standards.

The second category is where this gets genuinely interesting for anyone working on enterprise AI. A Skill is essentially a portable unit of organisational expertise — versioned, testable, distributable. The localisation parallel is obvious: a Skill that encodes how your company handles dates, currencies, honorifics, and ICU message format across 30 locales is doing exactly the kind of work an ontology does in a knowledge-graph pipeline. Different mechanism, same job: keep the model from improvising on things it shouldn’t improvise on.

One tip, worth pinning above your monitor: be very explicit in the description of your skill. That YAML frontmatter is what Claude reads to decide whether to load the rest. Vague descriptions = skills that never fire.

4. MCP and Hooks: where Claude reaches outside its box

MCP (Model Context Protocol) is how Claude talks to external systems — Atlassian, Asana, Linear, Canva, Sentry, Cloudflare, your own internal services. You add a server with mcp add <server-name> and /mcp to manage them. Configurations live in .mcp.json.

Hooks are shell commands that fire on lifecycle events: PreToolUse, PostToolUse, Notification, UserPromptSubmit, SessionStart. Use cases the session covered:

Auto-format on save (prettier on .ts, gofmt on .go)
Run tests after edits
Send notifications on completion (Slack, text, “carrier pigeon as long as there’s an API”)
Logging
Correcting Claude behaviour deterministically

The mental model they offered for choosing between features is actually the clearest part of the whole deck:

Read that table twice. It’s the cheat sheet.

5. Plugins: distribution, finally

The newest piece. A plugin is a single distributable that bundles MCP servers, Skills, subagents, custom slash commands, and hooks under one manifest. Installed via terminal commands.

Two things matter here:

Unified distribution — you ship one thing instead of asking colleagues to configure five.
Enterprise marketplaces — organisations can run internal and external marketplaces to distribute plugins across teams.

This is the piece that turns Claude Code from “a tool a senior engineer configures heroically” into something that scales across an org. It’s also, frankly, what makes “AI workflow design” a discipline you can actually hire for.

6. The three parallelism primitives

Here’s where the mental model expands. Anthropic distinguishes three ways of running Claude in parallel, and they are not interchangeable.

Subagents — same session, separate context windows

One Claude session spawns multiple subagents. Hub-and-spoke. Workers report summaries back. They live as markdown files in agents/ — code-reviewer.md, researcher.md, writer.md.

When to use:

Quick, focused workers that return summaries
Isolating noisy operations (test runs, large greps)
When you need the cheapest token footprint

The three conditions where subagents work well, from the deck:

Well-defined roles — clear, specialised role with defined tool access levels and success criteria.
Hands-off delegation — you don’t need to inspect or interact with the subagent’s work.
Enhance context management — subagents are a context-management strategy in disguise.

Agent Teams — separate processes, shared task list

Multiple Claudes with a shared task list and peer messaging. Best for debate and cross-layer coordination. Higher token cost, but you can have agents disagree productively.

Patterns that work:

Fan-out research — spawn N subagents to research modules in parallel.
Isolate noisy ops — route verbose test/grep output to a subagent; only the summary returns.
Chain sequential hand-off — reviewer subagent finds issues → optimiser subagent fixes them. Separation of concerns without nesting.
Competing hypotheses — spawn 5 teammates on a hard bug; they debate to disprove each other. The surviving theory is the real root cause.

That last one — competing hypotheses — is the most genuinely new pattern I picked up. It’s the model arguing with itself on purpose, and the disagreement is the signal.

Parallel Claude (Worktrees) — true filesystem isolation

claude --worktree. Multiple Claude Code instances running simultaneously, each in its own git worktree, each on its own branch, each with its own context. True parallel implementation without file conflicts.

When to use:

Implementing on separate branches you actually intend to merge.
Risky refactors you want isolated.
Auto-cleanup if no changes get made.

For cross-session awareness, they recommend TMUX.

The honest summary: subagents for context isolation, agent teams for coordination and debate, worktrees for filesystem-level parallelism. Most people reach for one when they want another. Pick deliberately.

7. Verification: the line that stuck with me

Claude is only as good as the thing that checks its work.

This is the part of the talk that has the most to say to anyone working in localisation, governance, or any domain where “sounds right” isn’t good enough.

Three verification modalities:

Rules-based — define pass/fail rules. Claude runs the checker and reads the output.
Visual — screenshot the output, feed it back to Claude. Best for UI work via Playwright MCP.
LLM-as-judge — a second model evaluates output against fuzzy criteria.

And five verification patterns worth memorising:

Isolate test noise — “run the test suite and report only failing tests with errors.” Verbose output stays in the subagent’s context; your main session stays clean.
Post-implementation review — “review this diff for edge cases I missed.” Fresh context = no author bias on the code it just wrote.
Find → Fix chain — reviewer subagent finds perf issues, optimiser subagent fixes them.
Write-new-tests challenge — “write three new tests not in the existing suite. Verify they pass.” Catches what the happy-path suite won’t.
Autonomous loop (”Ralph”) — define pass criteria upfront → let Claude iterate write-test-fix unattended until green. Works for overnight runs.

And then the framing they put a red box around:

Test Driven Development. Write tests first → have Claude implement until they pass. Claude can’t rationalise away a failing test — the spec is the contract.

This is the move. If you’ve ever watched an LLM gaslight itself into thinking broken code is fine, you know exactly why a hard spec it cannot edit is the whole game.

The localisation parallel here is something I’ve been arguing for years: linguistic QA needs to move upstream into the spec, not get bolted on after generation. “Sounds fluent” is the LLM rationalising. A failing locale-specific test it can’t argue with is the contract.

8. The slash commands worth knowing

Three that came up repeatedly:

/batch — orchestrates large-scale changes across a codebase in parallel. Describe the change; /batch researches, decomposes, and presents a plan. On approval it spawns one background agent per unit, each in an isolated git worktree, each runs tests and opens a PR. Example: /batch migrate src/ from Solid to React → 5–30 parallel agents, 1 PR per unit, auto-opened.
/simplify — reviews recently changed files for code reuse, quality, and efficiency, then fixes them. Three internal agents: Code Reuse, Code Quality, Efficiency. Run after implementing a feature.
/loop — scheduled prompts. Run prompts automatically on an interval. Examples: /loop 5m check if the deployment finished, /loop 15m scan error logs and open fix PRs, /loop remind me at 3pm to push the release. Tasks run in the background within your session. Default interval is 10 minutes. One-time natural-language reminders also supported.

/batch is the headline. The other two are the daily-use workhorses.

9. The pattern underneath all of it

If I had to compress the whole session into one diagram, it’d be this loop:

Plan → Implement → Verify → (Fail? → Read error → back to Implement) → Finish

That’s it. Every primitive above is in service of making one of those four nodes better:

Plan — CLAUDE.md hierarchy, Skills, MCP for external context
Implement — subagents, agent teams, parallel Claude
Verify — rules-based / visual / LLM-as-judge, hooks
Iterate — /loop, autonomous loops, TDD

The thing that changes when you take this seriously is your job. You stop being the person who types prompts. You become the person who designs the context in which the prompts get evaluated.

That’s the discipline. Everything else is syntax.

What I’m taking back to my own work

Two parallels I can’t stop thinking about, both light-touch but real:

Progressive disclosure is the ontology pattern. A Skill’s three-tier loading model — metadata always, body on trigger, assets on demand — is structurally identical to how a well-designed multilingual ontology behaves: the schema is always present, the concept graph loads as the domain is invoked, the locale-specific assets resolve last. If you’ve built one, you can recognise the other instantly.

The “thing that checks the work” is the unit of governance. In code: tests, types, linters, LLM-as-judge. In multilingual AI: terminology constraints, locale rules, semantic validators. The model is interchangeable; the verifier is the moat. Whoever controls verification controls quality. This has been the Black Ice argument from day one and it was strange and gratifying to hear it phrased back at me by someone from Anthropic without either of us mentioning localisation.

TL;DR for the impatient

Context engineering is the actual skill. Prompting is a subset.
CLAUDE.md for persistent instructions. Skills for on-demand expertise. MCP for external systems. Hooks for deterministic automation.
Plugins bundle all of the above for distribution.
Three parallelism primitives — subagents (context isolation), agent teams (coordination/debate), worktrees (filesystem parallelism). Not interchangeable.
Verification is the moat. TDD with Claude works because it can’t argue with a failing test.
Slash commands worth learning today: /batch, /simplify, /loop.
The whole thing is a loop: Plan → Implement → Verify → Iterate.

If you take one thing away, take this:

Claude is only as good as the thing that checks its work.

Build the checker first.

If you found this useful, the rest of The AI-Ready Localizer digs into the same patterns from the multilingual side — same discipline, different vocabulary.
Like, share, subscribe!

The AI-Ready Localizer

Discussion about this post

Ready for more?