Claude isn't one model — it's a family. The three names you'll see most are Opus, Sonnet, and Haiku. Each has a different sweet spot for capability, speed, and price. Picking the right one for a given task is small but compounds: the cost and time savings over a month of work add up to real money and real hours.
The three tiers, in plain English
Opus
The biggest, smartest model. Best at deep reasoning, complex refactors, multi-step planning, and any task where being a little wrong costs a lot. Slowest and most expensive per token. Worth it for high-stakes work, prohibitive for cheap repetitive work.
Sonnet
The middle ground. Most of what you'll do day-to-day in Claude Code is well served by Sonnet — code edits, bug fixes, small feature work, code reading. Substantially cheaper and faster than Opus, while still very capable. The default choice unless you know you need otherwise.
Haiku
The smallest and fastest. Good for high-volume, well-bounded tasks: filtering a list, formatting text, classifying inputs, bulk operations that don't need much judgment. Cheap enough to run constantly, fast enough that you don't notice the wait.
How to choose, in practice
A rough decision tree:
- Is the task small, well-bounded, repetitive? Haiku.
- Is it a typical code task — read a few files, make a change, verify? Sonnet.
- Is it a complex investigation, a tricky refactor, or architectural reasoning? Opus.
- Are you not sure? Sonnet. Sonnet is the right default; you can escalate later if it struggles.
Switching mid-session
You can change models within a session — useful when you start with one and realise it's not the right fit. A common pattern: start in Sonnet for the bulk of the work, switch to Opus when you hit a thorny problem, switch back when the hard part is solved.
Switching is a one-line command (/model opus or similar). The conversation context carries over; only the model behind the curtain changes.
Subagents and model choice
Subagents inherit the parent's model by default, but you can override per spawn. This is where real cost optimisation lives. A common setup:
- Main session: Sonnet. Talking to you, planning, steering.
- Research subagents: Haiku. Wide reads of many files, summary back. Fast and cheap.
- Code-review or architecture subagents: Opus. Want the careful eye.
A workflow built this way costs a fraction of the same workflow run entirely on Opus, with little quality loss.
The large context variants
Some models offer a large-context variant — a version of the same model with a much bigger context window (e.g., 1M tokens). Useful for:
- Reading enormous codebases in one shot.
- Working with very long log files or transcripts.
- Tasks that legitimately need to keep many files in mind at once.
Large-context variants typically cost more per token. The right move is to use them only when you genuinely need the window — not as a default. Most tasks fit fine in the standard context.
Fast mode
Some Claude Code setups offer a fast modeon Opus — same model, faster output speed. Useful when latency is the bottleneck rather than capability. Toggle it on for interactive sessions where you want quicker turn-around; toggle it off when you're running unattended (the speedup doesn't help and the cost is the same).
Reading your usage
Anthropic's dashboard shows your token consumption by model. Once a week or so, glance at it. Surprises usually mean one of two things:
- A scheduled task or background agent is running more often than you intended.
- You're using a more expensive model for a class of work where a cheaper one would do.
Both are easy to fix once you notice. The cost of not noticing is the "wait, why is my bill this big" surprise nobody enjoys.
CLAUDE.md re-read every session — is often cached by the API, dramatically reducing cost on the repeated portion. Structure long-running sessions so the stable context comes first, followed by the variable work. The model isn't magic, but the billing is generous about repetition.Don't let cost paralysis kill velocity
On the other side: don't become so cost-conscious that you avoid using Claude Code for valuable work. The thirty dollars Opus costs to plan and execute a tricky refactor is a fraction of the value of the refactor done well, and a tiny fraction of the cost of a human afternoon spent on it. Spend confidently where the work warrants. Be efficient where it doesn't.
- Haiku for bulk & bounded work. Sonnet as the default. Opus for hard, high-stakes thinking.
- You can switch models within a session. Often the right move is Sonnet → Opus for one hard chunk, then back.
- Use subagents on Haiku for wide research; reserve Opus for the parent or the critical reviewer subagent.
- Large-context variants only when you actually need the extra window. Don't default to them.
- Check usage weekly. Surprises usually mean an over-eager scheduler or a model mismatch.