An agent is strong in the middle and falls at the edges — just as confidently

Rare cases are infrequent but expensive, and the agent has no flag for 'I'm not sure here'

Read2 min read

Topicsproduction · edge-cases · reliability

TL;DR

An agent handles the common path well, then hits an edge case — an empty field, a strange format, an extreme input — and gets it wrong with the same confidence it uses for the common case. Edge cases are rare, so they slip past testing, but they're expensive when they blow. The dodge: actively test it with bad cases, not just nice ones.

You ask the agent to write a bit that processes a list. You test it with a sample list, it runs fine. It confidently reports done. Two weeks later an empty list comes through — and that bit blows up, or worse, quietly returns something meaningless. The "empty list" edge never appeared on your test desk, so it never appeared in its head either.

The scary part isn't that it falls. It's that it falls in the same confident voice it used when it was right. No little frown of "hmm, this one's odd." To an agent, the edge case and the common case look identical — until the result turns out wrong.

01Rare doesn't mean cheap

Common case (99% of runs) agent handles it well

Edge case (1% of runs) rare · but expensive when it blows

That 1% is easy to wave off while testing — "it's rare, deal with it later." But low frequency doesn't bring light consequences. An edge case that reaches production is often the one that corrupts a data stream, puts a wrong number on a report, or causes the two-a.m. incident. You save five minutes in testing, then pay it back in an afternoon of debugging.

02Test with bad cases, not just nice ones

The fix isn't a smarter agent — it's changing what you throw at it to test. The human instinct is to test with the nice example, because the nice example is easy to think up and easy to see as "right." But right on the nice case says almost nothing about toughness in the wild.

So before trusting something an agent made, actively ask: what's the edge here? Which field could be empty, null, unusually long? What does an extreme input look like? What happens when two things arrive at once? Then throw exactly those at it — or make it list the edge cases of the very thing it just built.

One tight question to make a habit of: "where does this break if the input isn't as nice as what I just tested?" Ask it while still in the chat and it's cheap. Let production ask it for you and it's expensive — and it always asks at the worst time.

An agent is strong in the middle and falls at the edges — just as confidently

01Rare doesn't mean cheap

02Test with bad cases, not just nice ones

Before You Fix It, Name It — Agent Failures Come in Four Recognizable Shapes

It's right twenty times, so you stop looking. The twenty-first bites.

01Rare doesn't mean cheap

02Test with bad cases, not just nice ones

Before You Fix It, Name It — Agent Failures Come in Four Recognizable Shapes

It's right twenty times, so you stop looking. The twenty-first bites.

Get new pieces by email