01Steering the agent Pillar

An agent will do the wrong thing — confidently — if you let it guess

Steering isn't ordering louder; it's leaving no room to guess

Published2026-05-31
Read5 min read
TypeField notes
TL;DR

An agent rarely says "I don't understand"; it guesses, then does the guess with total confidence. Steering it isn't about issuing sharper orders — it's about closing every gap where it's allowed to guess: structured commands instead of vague ones, forcing functions that block the bad default, and a habit of making it ask before it touches the keyboard.

There's a belief that creeps in when you first work with an agent: that if you're not clear enough, it'll ask. People do that — anyone handed a fuzzy task looks up and says "wait, what do you mean?"

Agents don't. Hand one an instruction that's seven-tenths clear and it'll fill the other three-tenths with a guess — then charge into building the version it invented, as crisp and decisive as if that were exactly what you wanted. It doesn't pause at the line of doubt. It sails past it.

01It's not defying you. It's guessing.

Here's the counterintuitive part: a human junior who's unsure gets tentative — asks again, works cautiously. An unsure agent gets confident, because it has no felt sense of "I'm not certain." It only has a probability distribution, and it picks the likeliest branch and moves on as if no other branch existed.

So the danger isn't that it refuses. The danger is that it delivers — fast, clean, and answering the wrong question. You ask it to "tidy up the error handling," it hears "rewrite the entire error layer my way." Both match the words you typed. Only one matches what you meant.

Steering an agent, then, is not about speaking louder or sterner. It's about shrinking the room it has to guess down toward zero, before it fills that room with judgment of its own.

02Layer 1 — Command structure: say what it needs, not what you're thinking

A vague instruction isn't vague because it's short; it's vague because it leaves the load-bearing things open. "Optimize this function" leaves open: optimize along which axis — speed, memory, readability? May it change the signature? Are there constraints it must not break?

Don't make it answer those. Answer them up front. A minimally structured command has three parts: a goal (measurable, not an adjective), boundaries (what's fair game, what's off-limits), and a result shape (what you want back so you can check it).

Compare these two, same intent:

Vague — leaves room to guess

"Optimize the order-processing function to be faster"
Faster how? change the signature? touch the shared cache?
It decides all of it → confidently wrong

Structured — no gaps left

"Cut the runtime; do NOT change the signature or the shared cache"
"When done, show me before/after on the same input"
Only one correct reading gets through

You don't need prose. You need to close the branches. Every spot you leave open is a spot it will fill — and it fills with whatever's most common in its training data, not with the specific thing in your head.

03Layer 2 — Forcing functions: block the bad default, don't just warn against it

Command structure handles what it should do. But agents have defaults strong enough that a polite warning won't hold: refactoring on the side, adding a library to make things "nicer," renaming for "consistency," writing a paragraph of explanation instead of doing the work.

"Don't go off-script" is nearly useless — it's advice, and advice competes weakly against habit. What works is a forcing function: a constraint that makes the bad shortcut impossible, not merely inadvisable.

That's the whole difference. "Don't change many files" is a warning. "Edit only file x; to touch any other file, stop and ask first" is a forcing function — it turns the bad move into a gate it has to knock on. "Test it carefully" is a warning. "Paste the exact test command you ran and its real output" is a forcing function — no output means not done.

The rule: every time you're about to write "remember not to…," stop and ask how do I make it unable to. Turn the warning into a required step, a required format, or a required stop.

04Layer 3 — Make it ask before it guesses

The two layers above seal the gaps you can see in advance. But there will always be gaps you didn't foresee — and that's exactly where an agent quietly guesses.

The fix is surprisingly cheap: grant it permission to say "I'm not sure," then require it to use that permission. One line at the top of a hard task: "Before doing anything, list the parts of this request that are ambiguous or could be read more than one way. If there are any, ask me. Don't choose for me."

That line does something subtle: it flips the default from guess-then-do to surface-the-blind-spots-first. Most of the time it'll pull out two or three things it would otherwise have guessed wrong. Answering three questions costs thirty seconds. Untangling a pile of rework because it guessed wrong costs an afternoon.

One tell worth memorizing: when it says "I'll assume…" — that's not information, that's a spot it just guessed without asking. Every "I'll assume" is an invitation to check.

05Steering is design, not luck

Put together, the three layers are one idea: don't hand off the work and hope — design a mold where only the correct reading fits through. Command structure closes the branches you can see. Forcing functions block the habits you know. The clarify-first question pulls out the blind spots you can't see.

It sounds like a lot of work. But look closer: all three are done once, at the start, measured in seconds. The thing they replace — discovering it misread the task after it's already finished — is paid for in an afternoon, and usually paid right when you thought you were done.

An agent will never look up and ask "what do you mean?" like a junior would. So that job becomes yours: ask on its behalf, up front, by leaving no room for the question to need asking.

End of pieceCluster 01 · 1/4
The author

craftagent is the notebook of someone still building — told over coffee, each story wrapped around a lesson paid for in full.