Let me tell you a scene. You'll recognize it.
The agent reports "Done!", bright and cheerful, with a summary so tidy you almost believe it. You open the diff… and your heart skips a beat.
It fixed what you asked. Then casually changed five other files. Renamed a function that half the codebase calls. And "fixed" a failing test by — quietly deleting the assert line. The summary, meanwhile, stays cheerful, like nothing happened.
I used to trust those "Done!"s. A few times. And each time, I relearned the same lesson — the one I want to hand you today, so you skip the tuition I paid: the problem isn't that the agent is dumb.
01It's not dumb. You're managing it wrong.
Here's the strange part: the better the agent gets, the more you have to slow your hands, not speed them up. Sounds backwards — until you realize you've misread what you're holding.
You think you're commanding a machine. You're actually managing a person: a brilliant junior engineer who types ten times faster than you, fearless, eager — but loses their memory every night. It doesn't know the project's history, can't feel which parts of the house are rotting, woke up this morning having forgotten everything the two of you decided yesterday.
A portrait of who you're really working with. Hold these three traits in mind, and everything below falls into place.
You'd never let a junior like that run loose on a production codebase. You'd supervise. And supervising here doesn't mean watching it type every line — watch every line and there goes your tenfold speed. It means posting a guard at exactly the moments it's most likely to fall, then letting it run free everywhere else.
It turns out every task you hand an agent shares the same lifecycle. And on that lifecycle, there are exactly four dangerous moments. Put a gate at each, and you move from praying it did the right thing to knowing it did.
Four gates on a task's lifecycle. Each gate is a moment you step in — about thirty seconds — before letting it move on.
02Gate 1 — before it types a line
Don't ask "how are you going to do this?" and then let it answer and do the work in the same breath — at that point the plan is just decoration glued onto a done deal. Gate it instead: make it lay out the approach, the files it'll touch, and one golden thing — what it's unsure about. Then stop. Not a line of code until you nod.
The test fits in one sentence: if the plan surprises you, the gate just saved you. Surprised like "why is it touching that" — it misread the scope. Surprised like "it's about to rewrite the whole module" — five times more than you need. Thirty seconds of reading, traded for an afternoon you don't spend cleaning up.
03Gate 2 — between the stages
Once it's through gate 1, don't let go of the wheel.
A big task done in one breath is where an agent drifts furthest. The mechanism is brutal: each step builds on the one before, so if step two is slightly off, steps three, four, and five all build on that wrong step — consistently. Each step looks reasonable on its own; only the whole is wrong. By the time you can see the full picture, it's gone too far to fix locally.
The fix is so simple it's easy to skip: cut the work into stages, each one a checkpoint — like save points in a game. Break a later stage and you respawn at the last save, not the start of the level. A well-sized stage: you can review it in one sitting, and throw it away cleanly without dragging another stage down.
04Gate 3 — before it touches anything shared
There's a quieter kind of fall, and it's the scariest of all.
The agent sees the file in front of it, but not the twelve other places calling into it — it has amnesia about the rest of the system, remember? And the scariest bug isn't the one that turns the build red; red, you catch right away, you fix it on the spot. The scary one keeps the build green, tests green, a skim review green too — then lies in wait, to detonate somewhere far away, at two in the morning.
So before it changes anything shared — a function signature, a field name, a data shape — make it answer one question first: "what depends on this?" And make it list them, no hand-waving. Editing inside a function body? Let it go. Crossing a boundary others lean on? Map it first.
05Gate 4 — before you believe it's done
And here's the easiest trap to fall into, because it goes straight for your trust.
"Done!" is a claim, not evidence. This is the hardest thing to accept about working with an agent: its level of confidence has almost nothing to do with its level of correctness. It reports success in that same cheerful voice whether the thing ran or not — it'll "assume the tests pass" instead of running them, describe the expected output instead of the actual one. It's not lying; it just doesn't draw a hard line between "I did it" and "I meant to do it."
A few words leak the truth, if you watch for them: "this should work" (means it didn't run it), "tests pass" with no run to show you, a description so smooth it's suspicious for something it couldn't have observed. The cure: don't read what it says — look at the real result. Make it run the thing and paste the actual output, or run it yourself for thirty seconds. For an agent, the old saying flips: not "trust but verify," but verify, then trust.
06So what do you actually say at each gate?
The four gates sound tidy, but what turns them from slogans into habit is the exact words you type. Here are the four I reach for over and over — copy them, bend them to your own voice:
"Lay out your approach: which files you'll touch, which direction you chose and why not the other, what you're unsure about. Then STOP — no code until I nod."
"Break this into small stages. Finish stage one, stop and show me, and only then move to stage two."
"Before you change this, list every place that calls or depends on it and show me. Then make the edit."
"When you're done, run it and paste the real output here — don't describe it, show me."
Four sentences, each one a gate. What they share: every one ends in a STOP, or a demand for evidence.
07The scaffolding beats the model
Those four gates don't slow you down. What slows you down is what happens without them: the agent breaks something once, you lose trust, and you start re-checking every line it writes — and the tenfold speed you just bought evaporates.
Look at the price tag and it's obvious why this trade always pays:
The cruel asymmetry: saying "done" costs the agent nothing. Finding out it isn't done costs you — usually at the worst moment.
And that paradox from the start turns out to make perfect sense: the stronger the model, the more those four gates matter. A bolder, faster junior travels further in the wrong direction before you manage a "wait, hold on."
So the thing worth learning over the next few years isn't clever prompting. It's how to supervise something brilliant, fast, and forgetful. You don't need a smarter agent — you need to become a better manager. And trust me, that part you can learn.
Each of the four gates deserves its own piece: Gate 1 — gate it, don't just ask · Gate 2 — stages & checkpoints · Gate 3 — read the blast radius · Gate 4 — "Done!" is just a claim.